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I. Introduction 

On August 14, 2003, just after 4 p.m. Eastern Daylight Time (EDT), 1 the North American power grid 
experienced its largest blackout ever. The blackout affected an estimated 50 million people and more 
than 70,000 megawatts (MW) of electrical load in parts of Ohio, Michigan, New York, Pennsylvania, 
New Jersey, Connecticut, Massachusetts, Vermont, and the Canadian provinces of Ontario and Quebec. 
Although power was successfully restored to most customers within hours, some areas in the United 
States did not have power for two days and parts of Ontario experienced rotating blackouts for up to two 
weeks. 

This report looks at the conditions on the bulk electric system that existed prior to and during the 
blackout, and explains how the blackout occurred. The report concludes with a series of 
recommendations for actions that can and should be taken by the electric industry to prevent or minimize 
the chance of such an outage occurring in the future. 

A. NERC Investigation 

1. Scope of Investigation 

Historically, blackouts and other significant electric system events have been investigated by the affected 
regional reliability councils. The NERC Disturbance Analysis Working Group would then review the 
regional reports and prepare its own evaluation of the broader lessons learned. The August 14 blackout 
was unique with regard to its magnitude and the fact that it affected three NERC regions. The scope and 
depth of NERC’s investigation into a blackout of this magnitude was unprecedented. 

Immediately following the blackout, NERC assembled a team of technical experts from across the United 
States and Canada to investigate exactly what happened, why it happened, and what could be done to 
minimize the chance of future outages. To lead this effort, NERC established a steering group of leading 
experts from organizations that were not directly affected by the cascading grid failure. 

The scope of NERC’s investigation was to determine the causes of the blackout, how to reduce the 
likelihood of future cascading blackouts, and how to minimize the impacts of any that do occur. NERC 
focused its analysis on factual and technical issues including power system operations, planning, design, 
protection and control, and maintenance. Because it is the responsibility of all power system operating 
entities to operate the electric system reliably at all times, irrespective of regulatory, economic, or market 
factors, the NERC investigation did not address regulatory, economic, market structure, or policy issues. 

2. Support for U.S.-Canada Power System Outage Task Force 

NERC’s technical investigation became a critical component of the U.S.-Canada Power System Outage 
Task Force, a bi-national group formed to examine all aspects of the August 14 outage. The Task Force 
formed three working groups to investigate the electric power system, nuclear power plant, and security 
aspects of the blackout. The electric system working group was led by representatives from the U.S. 
Department of Energy, the U.S. Federal Energy Regulatory Commission, and Natural Resources Canada. 

The NERC investigation provided support to the electric system working group, analyzing enormous 
volumes of data to determine a precise sequence of events leading to and during the cascade. The NERC 


1 All times referenced in this report have been converted to Eastern Daylight Time. 
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teams met regularly with representatives of the Task Force to determine why the blackout occurred and 
why it extended as far as it did. 

In its November 19 interim report, the Task Force concluded, and NERC concurred, that the initiating 
causes of the blackout were 1) that FirstEnergy (FE) lost functionality of its critical monitoring tools and 
as a result lacked situational awareness of degraded conditions on its transmission system, 2) that FE did 
not adequately manage tree growth in its transmission rights-of-way, 3) that the Midwest Independent 
System Operator (MISO) reliability coordinator did not provide adequate diagnostic support, and 4) that 
coordination between the MISO and PJM reliability coordinators was ineffective. The report cited 
several violations of NERC reliability standards as contributing to the blackout. 

After the interim report was issued, NERC continued to support the electric system working group. 

NERC also began to develop its own technical report and a set of recommendations to address issues 
identified in the investigation. 

3. Investigation Organization 

Before the electric system had been fully restored, NERC began to organize its investigation. NERC 
appointed a steering group of industry leaders with extensive executive experience, power system 
expertise, and objectivity. This group was asked to formulate the investigation plan and scope, and to 
oversee NERC’s blackout investigation. 

NERC’s initial efforts focused on collecting system data to establish a precise sequence of events leading 
up to the blackout. In the initial stage of the investigation, investigators began to build a sequence of 
events from information that was then available from NERC regions and from reliability coordinators. To 
complete such a large-scale investigation, however, it quickly became apparent that additional resources 
were needed. The investigation was augmented with individuals from the affected areas that had 
knowledge of their system design, configuration, protection, and operations. Having this first-hand 
expertise was critical in developing the initial sequence of events. These experts were added to the 
investigation teams and each team was assigned to build a sequence of events for a specific geographic 
area. As the sequence of events became more detailed, a database was created to facilitate management 
of the data and to reconcile conflicting time stamps on the thousands of events that occurred in the time 
leading up to and during the power system failure. 

The NERC Steering Group organized investigators into teams to analyze discrete events requiring 
specific areas of expertise, as shown in Figure 1.1. To fill these teams, NERC called on industry 
volunteers. The number and quality of experts who answered the call was extraordinary. Many of these 
volunteers relocated temporarily to Princeton, New Jersey, to allow for close collaboration during the 
investigation. The teams dedicated long hours — often seven days per week — over several months to 
analyze what happened and why. The investigators operated with complete autonomy to investigate all 
possible causes of the blackout. The investigation methods were systematic — investigators “looked 
under every rock” and methodically proved or disproved each theory put forth as to why and how the 
blackout occurred. 
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Figure 1.1 — NERC Blackout Investigation Organization 


4. Investigation Process 

Under the guidance of the Steering Group, NERC developed a formal investigation plan. The 
investigation plan assigned work scopes, deliverables, and milestones for each investigation team. The 
major elements of the investigation process are summarized here: 

• In the first days after the blackout, NERC and the reliability coordinators conferred by hotline 
calls to assess the status of system restoration, the continuing capacity shortage and rotating 
blackouts, and initial information on what had happened. 

• On August 17, NERC notified all reliability coordinators and control areas in the blackout area to 
retain state estimator, relay, and fault recorder data from 08:00 to 17:00 on August 14. A 
subsequent request added event logs, one-line diagrams, and system maps to the list. On August 
22, NERC issued a more substantive data request for the hours of 08:00 to 22:00 on August 14. 
Additional data requests were made as the investigation progressed. The response to the data 
requests was excellent; many entities submitted more information related to the blackout than was 
requested. To manage the enormous volume of data, NERC installed additional computers and a 
relational database, and assigned a team to catalog and manage the data. 

• As part of the U.S.-Canada Power System Outage Task Force investigation and in cooperation 
with NERC, the U.S. Department of Energy conducted onsite interviews with operators, 
engineers, computer staff, supervisors, and others at all of the affected reliability coordinator and 
control area operating centers. 

• The analysis portion of the investigation began with the development of a sequence of events. 

The initial focus was on the critical events leading up to the power system cascade. The task was 
painstakingly arduous due to the large volume of event data and the limited amount of 
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information that was precisely synchronized to a national time standard. Assembling the timeline 
to the level of accuracy needed for the remaining areas of investigation was analogous to 
completing a jigsaw puzzle with thousands of unique interlocking pieces. The initial sequence of 
events was published on September 12, 2003. 

• NERC established teams to analyze different aspects of the blackout. Each team was assigned a 
scope and the necessary experts to complete its mission. The teams interacted frequently with 
investigation leaders and with the co-chairs of the electric system working group. 

• A cornerstone of the investigation was a root cause analysis sponsored by the U.S. Department of 
Energy and facilitated by a contractor with expertise in that area. This systematic approach 
served to focus the investigation teams on proving the causes of the blackout based on verified 
facts. The work of these investigation teams was based on a) gathering data; b) verifying facts 
through multiple, independent sources; c) performing analysis and simulations with the data; and 
d) conducting an exhaustive forensic analysis of the causes of the blackout. 

• NERC assisted the U.S.-Canada Power System Outage Task Force in conducting a series of 
information-gathering meetings on August 22, September 8-9, and October 1-3. These meetings 
were open only to invited entities; each meeting was recorded, and a transcription prepared for 
later use by investigators. The first meeting focused on assessing what was known about the 
blackout sequence and its causes, and identifying additional information requirements. The 
second meeting focused on technical issues framed around a set of questions directed to each 
entity operating in the blackout area. The third meeting focused on verifying detailed information 
to support the root cause analysis. Participation was narrowed to include only the investigators 
and representatives of the FirstEnergy and AEP control areas, and the Midwest Independent 
Transmission System Operator (MISO) and PJM reliability coordinators. 

• On October 15, 2003, NERC issued a letter to all reliability coordinators and system operating 
entities that required them to address some of the key issues arising from the investigation. 

• On November 19, 2003, the U.S.-Canada Power System Outage Task Force issued its interim 
report on the events and causes of the August 14 blackout. The report was developed in 
collaboration with the NERC investigation and NERC concurred with the report’s findings. 

• The second phase of the blackout investigation began after the interim report was released. For 
NERC, the second phase focused on two areas. First, NERC continued to analyze why the 
cascade started and spread as far as it did. The results of this analysis were incoiporated into this 
report and also provided to the U.S.-Canada Power System Outage Task Force for inclusion in its 
final report, which was issued on April 5, 2004. NERC also began, independently of the Task 
Force, to develop an initial set of recommendations to minimize the risk and mitigate the impacts 
of possible future cascading failures. These recommendations were approved by the NERC 
Board of Trustees on February 10, 2004. 

5. Coordination with NERC Regions 

The NERC regions and the regional transmission organizations (RTOs) within these regions played an 
important role in the NERC investigation; these entities also conducted their own analyses of the events 
that occurred within their regions. The regions provided a means to identify all facility owners and to 
collect the necessary data. Regular conference calls were held to coordinate the NERC and regional 
investigations and share results. 

The NERC regions provided expert resources for system modeling and simulation and other aspects of 
the analysis. The investigation relied on the multi-regional MAAC-ECAR-NPCC Operations Studies 
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Working Group, which had developed summer loading models of the systems affected by the blackout. 
Other groups, such as the SS-38 Task Force (system dynamics data) and Major System Disturbance Task 
Force, provided valuable assistance to the investigation. 

The restoration phase of the blackout was successful, and NERC has deferred the bulk of the analysis of 
system restoration efforts to the regions, RTOs, and operating entities. Evaluation of the restoration is a 
significant effort that requires analyzing the effectiveness of thousands of actions against local and 
regional restoration plans. The results of this analysis will be consolidated by NERC and reported at a 
future date. 

6. Ongoing Dynamic Investigation 

The electrical dynamics of the blackout warrant unprecedented detailed technical analysis. The MAAC- 
ECAR-NPCC Major System Disturbance Task Force continues to analyze the dynamic swings in voltage, 
power flows, and other events captured by high-speed disturbance recorders. The results of that work will 
be published as they become available. 

B. Report Overview 

The report begins by telling a detailed story of the blackout, outlining what happened, and why. Thi s 
portion of the report is organized into three sections: Section II describes system conditions on August 14 
prior to the blackout, Section III describes events in northeastern Ohio that triggered the start of an 
uncontrolled cascade of the power system, and Section IV describes the ensuing cascade. The report 
concludes in Section V with a summary of the causes of the blackout, contributing factors, and other 
deficiencies. This section also provides a set of NERC recommendations. The majority of these 
recommendations were approved on February 10, 2004; however, several new recommendations have 
been added. Supplemental reports developed by investigation teams are under development and will be 
available in phase II of this report. 

A report on vegetation management issues developed by the U.S.-Canada Power System Outage Task 
Force is an additional reference that complements this report. 

C. Key Entities Affected by the August 14 Blackout 

1. Electric Systems Affected by the Blackout 

The August 14 blackout affected the northeastern portion of the Eastern Interconnection, covering 
portions of three NERC regions. The blackout affected electric systems in northern Ohio, eastern 
Michigan, northern Pennsylvania and New Jersey, much of New York and Ontario. To a lesser extent, 
Massachusetts, Connecticut, Vermont, and Quebec were impacted. The areas affected by the August 14 
blackout are shown in Figure 1.2. 
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Figure 1.2 — Area Affected by the Blackout 


The power system in Ontario is operated by the Independent System Operator (IMO). The New York 
system is operated by the New York Independent System Operator (NYISO). The mid-Atlantic area, 
including the northern Pennsylvania and northern New Jersey areas affected by the blackout, is operated 
by the PJM Interconnection, LLC (PJM). Each of these entities operates an electricity market in their 
respective area and is responsible for reliability of the bulk electric system in that area. Each is 
designated as both the system operator and the reliability coordinator for their respective area. 

In the Midwest, several dozen utilities operate their own systems in their franchise territory. Reliability 
oversight in this region is provided by two reliability coordinators, the Midwest Independent 
Transmission System Operator (MISO) and PJM. 

New England, which is operated by the New England Independent System Operator (ISO-NE), was in the 
portion of the Eastern Interconnection that became separated, but was able to stabilize its generation and 
load with minimal loss, except for the southwest portion of Connecticut, which blacked out with New 
York City. Nova Scotia and Newfoundland were also not impacted severely. Hydro-Quebec operates the 
electric system in Quebec and was mostly unaffected by the blackout because this system is operated 
asynchronously from the rest of the interconnection. 

Several of the key players involved in the blackout are described in more detail below. 
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FirstEnergy Corporation (FE) is the fifth largest electric utility in the United States. FE serves 4.4 million 
electric customers in a 36,100 square mile service territory covering parts of Ohio, Pennsylvania, and 
New Jersey. FE operates 11,502 miles of transmission lines, and has 84 ties with 13 other electric 
systems. 


FE comprises seven operating companies (Figure 1.3). Four of these companies, Ohio Edison, Toledo 
Edison, The Illuminating Company, and Penn Power, operate in the ECAR region; MISO serves as their 
reliability coordinator. These four companies now operate as one integrated control area managed by FE. 
The remaining three FE companies, Penelec, Met-Ed, and Jersey Central Power & Light, are in the 
MAAC region and PJM is their reliability coordinator. This report addresses the FE operations in 
northern Ohio, within ECAR and the MISO reliability coordinator footprint. 



Figure 1.3 — FE Operating Areas 


FE operates several control centers in Ohio that perform different functions. The first is the unregulated 
Generation Management System (GMS), which is located in a separate facility from the transmission 
system operations center. The GMS handles the unregulated generation portion of the business, including 
Automatic Generation Control (AGC) for the FE units, managing wholesale transactions, determining 
fuel options for their generators, and managing ancillary services. On August 14, the GMS control center 
was responsible for calling on automatic reserve sharing to replace the 612 MW lost when the Eastlake 
Unit 5 tripped at 13:31. 

The second FE control center houses the Energy Management System (EMS). The EMS control center is 
charged with monitoring the operation and reliability of the FE control area and is managed by a director 
of transmission operation services. Two main groups report to the director. The first group is responsible 
for real-time operations and the second is responsible for transmission operations planning support. The 
operations planning group has several dispatchers who perform day-ahead studies in a room across the 
hall from the control room. 
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The real-time operations group is divided into two areas: control area operators and transmission 
operators. Each area has two positions that are staffed 24 hours a day. A supervisor with responsibility 
for both areas is always present. The supervisors work 8-hour shifts (7:00-15:00, 15:00-21:00, and 
21:00-7:00), while the other operators work 12-hour shifts (6:00-18:00 and 18:00-6:00). The 
transmission operators are in the main control room, the control area operators are in a separate room. 

Within the main control room there are two desks, or consoles, for the transmission operators: the 
Western Desk, which oversees the western portion of the system, and the Eastern Desk, which oversees 
the eastern portion of the system. There is also a desk for the supervisor in the back of the room. There 
are other desks for operators who are performing relief duty. 

In addition to the EMS control center, FE maintains several regional control centers. These satellite 
operating centers are responsible for monitoring the 34.5-kV and 23-kV distribution systems. These 
remote consoles are part of the GE/Harris EMS system discussed later in this report, and represent some 
of the remote console failures that occurred. 

3. MISO 

The Midwest Independent Transmission System Operator (MISO) is the reliability coordinator for a 
region that covers more than one million square miles, stretching from Manitoba, Canada, in the north to 
Kentucky in the south; from Montana in the west to western Pennsylvania in the east. Reliability 
coordination is provided by two offices, one in Minnesota, and the other at the MISO headquarters in 
Carmel, Indiana. MISO provides reliability coordination for 35 control areas, most of which are 
members of MISO. 

MISO became the reliability coordinator for FirstEnergy on February 1, 2003, when the ECAR-MET 
reliability coordinator office operated by AEP became part of PJM. FirstEnergy became a full member 
of MISO on October 1, 2003, six weeks after the blackout. 
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Figure 1.4 — Midwest Reliability Coordinators 


4. AEP 

American Electric Power (AEP), based in Columbus, Ohio, owns and operates more than 80 generating 
stations with more than 42,000 MW of generating capacity in the United States and international markets. 
AEP is one of the largest electric utilities in the United States, with more than five million customers 
linked to AEP’s 1 l-state electricity transmission and distribution grid. AEP’s 197,500 square mile 
service territory includes portions of Arkansas, Indiana, Kentucky, Louisiana, Michigan, Ohio, 

Oklahoma, Tennessee, Texas, Virginia, and West Virginia. AEP operates approximately 39,000 miles of 
electric transmission lines. AEP operates the control area in Ohio just south of the FE system. 

AEP system operations functions are divided into two groups: transmission and control area operations. 
AEP transmission dispatchers issue clearances, perform restoration after an outage, and conduct other 
operations such as tap changing and capacitor ha nk switching. They monitor all system parameters, 
including voltage. AEP control area operators monitor ACE, maintain contact with the PJM reliability 
coordinator, implement transaction schedules, watch conditions on critical flowgates, implement the 
NERC Transmission Loading Relief (TLR) process, and direct generator voltage schedules. AEP 
maintains and operates an energy management system complete with a state estimator and on-line 
contingency analysis that runs every five minutes. 

5. PJM Interconnection, LLC 

The PJM Interconnection, LLC (PJM) is AEP’s reliability coordinator. PJM’s reliability coordination 
activity is centered in its Valley Forge, Pennsylvania, headquarters with two operating centers, one in 
Valley Forge and one in Greensburg, Pennsylvania. There are two open video/audio live links between 
the west control center in Greensburg and the east control center in Valley Forge that provide for 
connectivity and presence between the two control centers. In training, operators are moved between all 
of the desks in Valley Forge and Greensburg. 
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PJM is also an independent system operator. PJM recently expanded its footprint to include control areas 
and transmission operators within MAIN and ECAR into an area it has designated as PJM-West. In PJM- 
East, the original PJM power pool, PJM is both the control area operator and reliability coordinator for 
ten utilities whose transmission systems span the Mid-Atlantic region of New Jersey, most of 
Pennsylvania, Delaware, Maryland, West Virginia, Ohio, Virginia, and the District of Columbia. At the 
time of the blackout, the PJM-West facility was the reliability coordinator desk for several control areas 
(Commonwealth Edison-Exelon, AEP, Duquesne Light, Dayton Power and Light, and Ohio Valley 
Electric Cooperative) and four generation-only control areas (Duke Energy’s Washington County (Ohio) 
facility, Duke’s Lawrence County/Flanging Rock (Ohio) facility, Allegheny Energy’s Buchanan (West 
Virginia) facility, and Allegheny Energy’s Lincoln Energy Center (Illinois) facility. 

6. ECAR 

The East Central Area Reliability Coordination Agreement (ECAR) is one of the ten NERC regional 
reliability councils. ECAR was established in 1967 as the forum to address matters related to the 
reliability of interconnected bulk electric systems in the east central part of the United States. ECAR 
members maintain reliability by coordinating the planning and operation of the members’ generation and 
transmission facilities. ECAR membership includes 29 major electricity suppliers located in nine states 
serving more than 36 million people. The FE and AEP systems of interest in Ohio are located within 
ECAR. 

ECAR is responsible for monitoring its members for compliance with NERC operating policies and 
planning standards. ECAR is also responsible for coordinating system studies conducted to assess the 
adequacy and reliability of its member systems. 
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II.Conditions Prior to the Start of the Blackout Sequence 

The electricity industry has developed and codified a set of mutually reinforcing reliability standards and 
practices to ensure that system operators are prepared to deal with unexpected system events. The basic 
assumption underlying these standards and practices is that power system elements will fail or become 
unavailable in unpredictable ways. The basic principle of reliability management is that “operators must 
operate to maintain the security of the system they have available.” 

Sound reliability management is geared toward ensuring the system will continue to operate safely 
following the unexpected loss of any element, such as a major generating or transmission facility. 
Therefore, it is important to emphasize that establishing whether conditions on the system were normal or 
unusual prior to and on August 14 would not in either case alleviate the responsibilities and actions 
expected of the power system operators, who are charged with ensuring reliability. 

In terms of day-ahead planning, system operators must analyze the system and adjust the planned outages 
of generators and transmission lines or scheduled electricity transactions, so that if a facility was lost 
unexpectedly, the system operators would still be able to operate the remaining system within safe limits. 
In terms of real-time operations, this means that the system must be operated at all times to be able to 
withstand the loss of any single facility and still remain within thermal, voltage, and stability limits. If a 
facility is lost unexpectedly, system operators must take necessary actions to ensure that the remaining 
system is able to withstand the loss of yet another key element and still operate within safe limits. 

Actions system operators may take include adjusting the outputs of generators, curtailing electricity 
transactions, curtailing interruptible load, and shedding firm customer load to reduce electricity demand 
to a level that matches what the system is able to deliver safely. These practices have been designed to 
maintain a functional and reliable grid, regardless of whether actual operating conditions are normal. 

A. Summary of System Conditions on August 14, 2003 

This section reviews the status of the northeastern portion of the Eastern Interconnection prior to 15:05 on 
August 14. Analysis was conducted to determine whether system conditions at that time were in some 
way unusual and might have contributed to the initiation of the blackout. 

Using steady-state (power flow) analysis, investigators found that at 15:05, immediately prior to the 
tripping of FE Chamberlin-Harding 345-kV transmission line, the system was able to continue to operate 
reliably following the occurrence of any of more than 800 identified system contingencies, including the 
loss of the Chamberlin-Harding line. In other words, at 15:05 on August 14, 2003, the system was being 
operated within defined steady-state limits. 

Low voltages were found in the Cleveland-Akron area operated by FE on August 14 prior to the blackout. 
These voltages placed the system at risk for voltage collapse. However, it can be said with certainty that 
low voltage or voltage collapse did not cause the August 14 blackout. P-Q and V-Q analysis by 
investigators determined that the FE system in northeastern Ohio was near a voltage collapse, but that 
events required to initiate a voltage collapse did not occur. 

Investigators analyzed externalities that could have had adverse effects on the FE system in northeastern 
Ohio and determined that none of them caused the blackout. August 14 was warm in the Midwest and 
Northeast. Temperatures were above normal and there was very little wind, the weather was typical of a 
warm summer day. The warm weather caused electrical demand in northeastern Ohio to be high, but 
electrical demand was not close to a record level. Voltages were sagging in the Cleveland-Akron area 
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due to a shortage of reactive power resources and the heavy air-conditioning loads, causing the FE system 
in that area to approach a voltage collapse condition. 

Investigators also analyzed the interregional power transfers occurring on August 14 and determined that 
transfers across the area were high, but within studied limits and less than historical values and did not 
cause the blackout. Frequency anomalies on the Eastern Interconnection on August 14 prior to the 
blackout were determined to be caused by scheduling practices and were unrelated to the blackout. 

In summary, prior to the 15:05 trip of the Chamberlin-Harding 345-kV line, the power system was within 
the operating limits defined by FE, although it was determined that FE had not effectively studied the 
minimum voltage and reactive supply criteria of its system in the Cleveland-Akron area. Investigators 
eliminated factors such as high power flows to Canada, low voltages earlier in the day or on prior days, 
the unavailability of specific generators or transmission lines (either individually or in combination with 
one another), and frequency anomalies as causes of the blackout. 

B. Electric Demand and Comparisons to Historical Levels 

August 14 was a hot summer day, but not unusually so. Temperatures were above normal throughout the 
northeast region of the United States and in eastern Canada. Electricity demand was high due to high air- 
conditioning loads typical of warm days in August. However, electricity demands were below record 
peaks. System operators had successfully managed higher demands both earlier in the summer and in 
previous years. Northern Ohio was experiencing an ordinary August afternoon, with loads moderately 
high to serve air-conditioning demand. FE imports into its Northern Ohio service territory that afternoon 
peaked at 2,853 MW, causing its system to consume high levels of reactive power. 
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Figure 11.1 —August 14 Temperatures in Northeastern United States and Eastern Canada 


Table II. 1 displays the peak load demands for AEP, Michigan Electric Coordinated System (MECS), FE, 
and PJM during the week of August 11, along with the temperatures measured at the Akron-Canton 
Airport. As the daily high temperature in northeastern Ohio (represented by temperatures at the Akron- 
Canton airport) increased from 78° F on August 11 to 87° F on August 14, the FE control area peak load 
demand increased by 20 percent from 10,095 MW to 12,165 MW. The loads in the surrounding systems 
experienced similar increases. 

It is noteworthy that the FE control area peak load on August 14 was also the peak load for the summer of 
2003, although it was not the all-time peak recorded for that system. That record was set on August 1, 
2002, at 13,299 MW; 1,134 MW higher than on August 14, 2003. Given the correlation of load increase 
with ambient temperature, especially over a period of several days of warm weather, it is reasonable to 
assume that the load increase was due at least in part to the increased use of air conditioners. These 
increased air-conditioning loads lowered power factors compared to earlier in the week. These are 
important considerations when assessing voltage profiles, reactive reserves, and voltage stability. 
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Table 11.1 — System Conditions for the Week August 11-14, 2003 


All Load Values in MW 

Monday 
Aug. 11 

Tuesday 
Aug. 12 

Wednesday 
Aug. 13 

Thursday 
Aug. 14 

Dry bulb temperature at Akron- 
Canton Airport 

78° F 

83° F 

85° F 

87° F 

FE daily peak load in Northern Ohio 
(percent increase from August 11) 

10,095 

10,847 
(7.5 percent) 

11,556 

(14.5 percent) 

12,165 

(20.5 percent) 

MECS load at 14:00 (percent 
increase from August 11) 

15,136 

15,450 
(2.1 percent) 

17,335 

(14.5 percent) 

18,796 

(24.2 percent) 

AEP load at 14:00 (percent increase 
from August 11) 

17,321 

18,058 
(4.3 percent) 

18,982 
(9.6 percent) 

19,794 

(14.3 percent) 

PJM peak load (percent increase 
from August 11) 

52,397 

56,683 
(8.2 percent) 

58,503 

(11.7 percent) 

60,740 

(15.9 percent) 


As shown in Table II.2, FE’s recorded peak electrical demands on August 14 and in prior months were 
well below the previously recorded peak demand. 

Ta ble 11.2 — Loads on August 14 Compared to Summer 2003 and Summer 2002 Pea ks 


Month/Year 

Actual Peak Load for Month 

Date of Peak 

August 2002 

13,299 MW 

August 1, 2002 (All-time peak) 

June 2003 

11,715 MW 


July 2003 

11,284 MW 


August 2003 

12,165 MW 

August 14, 2003 (Summer 2003 Peak) 


The day-ahead projections for the FE control area, as submitted to ECAR around 16:00 each afternoon 
that week, are shown in Table II.3. The projected peak load for August 14 was 765 MW lower than the 
actual FE load. FE load forecasts were low each day that week. FE forecasted that it would be a net 
importer over this period, peaking at 2,367 MW on August 14. Actual imports on August 14 peaked at 
2,853 MW. 


Table 11.3 — FE Day-ahead Load Projections for Week of August 11-14, 2003 


All values are in MW 

Monday 
Aug. 11 

Tuesday 
Aug. 12 

Wednesday 
Aug. 13 

Thursday 
Aug. 14 

Projected Peak Load 

10,300 

10,200 

11,000 

11,400 

Capacity Synchronized 

10,335 

10,291 

10,833 

10,840 

Projected Import 

1,698 

1,800 

1,818 

2,367 

Projected Export (to PJM) 

1,378 

1,378 

1,278 

1,278 

Net Interchange (negative value is an import) 

-320 

-422 

-540 

-1,089 

Spinning Reserve 

355 

513 

373 

529 

Unavailable Capacity 

1,100 

1,144 

1,263 

1,433 


C. Facilities out of Service 

On any given day, generation and transmission capacity is unavailable; some facilities are out for routine 
maintenance, and others have been forced out by an unanticipated breakdown and need for repairs. 
August 14 was no exception. 
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1. Planned Generation Outages 

Several key generators were out of service going into August 14. 


Table 11.4 — Key Generators Not Available on August 14, 2003 


Generator 

Rating 

Reason for Outage 

Davis-Besse Nuclear Unit 

934 MW 

481 Mvar 

Prolonged NRC-ordered outage beginning on 
3/22/02 

Eastlake Unit 4 

267 MW 

150 Mvar 

Forced outage on 8/13/03 

Monroe Unit 1 

780 MW 

420 Mvar 

Planned outage, taken out of service on 8/8/03 

Cook Nuclear Unit 2 

1,060 MW 

460 Mvar 

Outage began on 8/13/03 

Conesville 5 

400 MW 

145 Mvar 

Tripped at 12:05 on August 14 due to fan trip 
and high boiler drum pressure while returning a 
day early from a planned outage. 


These generating units provide real and reactive power directly to the Cleveland, Toledo, and Detroit 
areas. Under routine practice, system operators take into account the unavailability of such units and any 
transmission facilities known to be out of service in the day-ahead planning studies they perform to 
determine the condition of the system for the next day. Knowing the status of key facilities also helps 
operators to determine in advance the safe electricity transfer levels for the coming day. MISO’s day- 
ahead planning studies for August 14 took these generator outages and known transmission outages into 
account and determined that the regional system could be operated safely. Investigator analysis confirm s 
that the unavailability of these generation units did not cause the blackout. 

2. Transmission and Generating Unit Unplanned Outages Earlier in the 
Day of August 14 

Several unplanned outages occurred on August 14 prior to 15:05. Around noon, several transmission 
lines in south-central Indiana tripped; at 13:31, the Eastlake 5 generating unit along the shore of Lake Erie 
tripped; at 14:02, the Stuart-Atlanta 345-kV line in southern Ohio tripped. 

At 12:08, Cinergy experienced forced outages of its Columbus-Bedford 345-kV transmission line in 
south-central Indiana, the Bloomington-Denois Creek 230-kV transmission line, and several 138-kV 
lines. Although the loss of these lines caused significant voltage and facility loading problems in the 
Cinergy control area, they had no electrical effect on the subsequent events in northeastern Ohio leading 
to the blackout. The Cinergy lines remained out of service during the entire blackout (except for some 
reclosure attempts). 

MISO operators assisted Cinergy by implementing TLR procedures to reduce flows on the transmission 
system in south-central Indiana. Despite having no direct electrical bearing on the blackout, these early 
events are of interest for three reasons: 

• The Columbus-Bedford line trip was caused by a tree contact, which was the same cause of the 
initial line trips that later began the blackout sequence in northeastern Ohio. The Bloomington- 
Denois Creek 230-kV line tripped due to a downed conductor caused by a conductor sleeve 
failure. 

• The Bloomington-Denois Creek 230-kV outage was not automatically communicated to the 
MISO state estimator and the missing status of this line caused a large mismatch error that 
stopped the MISO state estimator from operating correctly at about 12:15. 
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• Several hours before the start of the blackout, MISO was using the TLR procedure to offload 
flowgates in the Cinergy system following multiple contingencies. Although investigators 
believe this prior focus on TLR in Cinergy was not a distraction for later events that began in 
Ohio, it is indicative of the approach that was being used to address post-contingency facility 
overloads. 

Eastlake Unit 5, located near Cleveland on the shore of Lake Erie, is a generating unit with a normal 
rating of 597 MW that is a major source of reactive power support for the Cleveland area. It tripped at 
13:31 carrying 612 MW and 400 Mvar. The unit tripped because, as the Eastlake 5 unit operator sought 
to increase the unit’s reactive power output in response to a request from the FE system operator, the 
unit’s protection system detected an excitation (voltage control) system failure and tripped the unit off¬ 
line. The loss of the unit required FE to import additional power to make up for the loss of the 612 MW 
in the Cleveland area, made voltage management in northern Ohio more challenging, and gave FE 
operators less flexibility in operating their system. With two of Cleveland’s generators already shut down 
(Davis-Besse and Eastlake 4), the loss of Eastlake 5 further depleted critical voltage support for the 
Cleveland-Akron area. Detailed simulation modeling reveals that the loss of Eastlake 5 was a significant 
factor in the outages later that afternoon; with Eastlake 5 forced out of service, transmission line loadings 
were notably higher but well below ratings. The Eastlake 5 unit trip is described in greater detail in 
Section III. 

The Stuart-Atlanta 345-kV line, a Dayton Power and Light (DP&L) tie to AEP that is in the PJM-West 
reliability coordination area, tripped at 14:02. The line tripped as the result of a tree contact and remained 
out of service during the entire blackout. System modeling showed that this outage was not related 
electrically to subsequent events in northern Ohio that led to the blackout. However, since the line was 
not in MISO’s footprint, MISO operators did not monitor the status of this line and did not know that it 
had tripped out of service. Having an incorrect status for the Stuart-Atlanta line caused MISO’s state 
estimator to continue to operate incorrectly, even after the previously mentioned mismatch was corrected. 

D. Power Transfers and Comparisons to Historical Levels 

On August 14, the flow of power through the ECAR region was heavy as a result of large transfers of 
power from the south (Tennessee, Kentucky, Missouri, etc.) and west (Wisconsin, Minnesota, Illinois, 
etc.) to the north (Michigan) and east (New York). The destinations for much of the power were northern 
Ohio, Michigan, and Ontario, Canada. 

While heavy, these transfers were not beyond previous levels or in directions not seen before. The level 
of imports into Ontario on August 14 was high but not unusually so. Ontario’s IMO is a frequent 
importer of power; depending on the availability and price of generation within Ontario. IMO had safely 
imported similar and even larger amounts of power several times during the summers of 2003 and 2002. 
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Load 28,402 MW 
Generation 27,372 MW 


Load 19,660 MW 
Generation 18,225 MW 



Northern 

Indiana 

Public 

Service 


Figure 11.2 — Generation, Demand, and Interregional Power Flows on August 14 at 15:05 


Figure II. 3 shows that the imports into the area comprising Ontario, New York, PJM, and ECAR on 
August 14 (shown by the red circles to be approximately 4,000 MW throughout the day) were near the 
peak amount of imports into that area for the period June 1 to August 13, 2003, although the August 14 
imports did not exceed amounts previously seen. 
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Figure 11.3 — August 14, 2003, Northeast-Central Scheduled Transfers Compared to 

Historical Values 

Figure II.4 shows the aggregated imports of the companies around Lake Erie, (MECS, IMO, FE, DLCO, 
NYISO, and PJM) for the peak summer days in 2002 and the days leading up to August 14, 2003. The 
comparison shows that the imports into the Lake Erie area were increasing in the days just prior to August 
14, but that the level of these imports was lower than those recorded during the peak periods in the 
summer of 2002. Indeed, the import values in 2002 were about 20 percent higher than those recorded on 
August 14. Thus, although the imports into the Lake Erie area on August 14 were high, they were not 
unusually high compared to previous days in the week and were certainly lower than those recorded the 
previous summer. 



7/22/02 8/22/02 7/8/03 8/5/03 8/11/03 8/12/03 8/13/03 8/14/03 

Figure 11.4 — Imports for Lake Erie Systems 
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Another view of transfers is provided by examining the imports into IMO. Figure II.5 shows the total 
hourly imports into IMO for 2002 and 2003 for all days during July and August. These data show that the 
import levels in 2003 were generally lower compared to 2002 and that the peak import on August 14, 
2003, at 2,130 MW at 14:00 was half the value recorded for the peak period in the summer of 2002. 



Hourly (July to August) 

Figure 11.5 — Hourly Imports into IMO 


E. Voltage and Reactive Power Conditions Prior to the Blackout 

1. FE Voltage Profiles 

Unlike frequency, which is the same at any point in time across the interconnection, voltage varies by 
location and operators must monitor voltages continuously at key locations across their systems. During 
the days and hours leading up to the blackout, voltages were routinely depressed in a variety of locations 
in northern Ohio because of power transfers across the region, high air-conditioning demand, and other 
loads. During an interview, one FE operator stated, “some [voltage] sagging would be expected on a hot 
day, but on August 14 the voltages did seem unusually low.” However, as shown below in figures II.6 
and II.7, actual measured voltage levels at key points on the FE transmission system on the morning of 
August 14 and up to 15:05 were within the range previously specified by FE as acceptable. Note, 
however, that most control areas in the Eastern Interconnection have set their low voltage limits at levels 
higher than those used by FE. 

Generally speaking, voltage management can be especially challenging on hot summer days because of 
high transfers of power and high air-conditioning requirements, both of which increase the need for 
reactive power. Operators address these challenges through long-term planning, day-ahead planning, and 
real-time adjustments to operating equipment. On August 14, for example, investigators found that most 
systems in the northeastern portion of the Eastern Interconnection were implementing critical voltage 
procedures that are routinely used for heavy load conditions. 
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Figure 11.6 — Representative Voltage Profile on FE System during Week of 

August 11 
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Figure 11.7 — 345-kV Voltages in Northeastern Ohio on August 14, 2003 

The existence of low voltages in northern Ohio is consistent with the patterns of power flow and 
composition of load on August 14. The power flow patterns for the region just before the Chamberlin- 
Harding line tripped at 15:05 show that FE was a major importer of power. Air-conditioning loads in the 
metropolitan areas around the southern end of Lake Erie were also consuming reactive power (Mvar). 
The net effect of the imports and load composition was to depress voltages in northern Ohio. Consistent 
with these observations, the analysis of reactive power flow shows that northern Ohio was a net importer 
of reactive power. 
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FE operators began to address voltage concerns early in the afternoon of August 14. For example, at 
13:33, the FE operator requested that capacitors at Avon Substation be restored to service. From 13:13 
through 13:28, the FE system operator called nine power plant operators to request additional voltage 
support from generators. He noted to most of them that system voltages were sagging. The operator 
called the following plants: 

• Sammis plant at 13:13: “Could you pump up your 138 voltage?” 

• West Lorain at 13:15: “Thanks. We’re starting to sag all over the system.” 

• Eastlake at 13:16: “We got a way bigger load than we thought we would have. So we’re starting 
to sag all over the system.” 

• Three calls to other plants between 13:20 and 13:23, stating to one: “We’re sagging all over the 
system. I need some help.” Asking another: “Can you pump up your voltage?” 

• “Unit 9” at 13:24: “Could you boost your 345?” Two more at 13:26 and at 13:28: “Could you 
give me a few more volts?” 

• Bayshore at 13:41 and Perry 1 operator at 13:43: “Give me what you can. I’m hurting.” 

• 14:41 to Bayshore: “I need some help with the voltage.. .I’m sagging all over the system...” The 
response to the FE Western Desk: “We’re fresh out of vars.” 

Several station operators said that they were already at or near their reactive output limits. Following the 
loss of Eastlake 5 at 13:31, FE operators’ concern about voltage levels was heightened. Again, while 
there was substantial effort to support voltages in the Ohio area, FE personnel characterized the 
conditions as not being unusual for a peak load day. No generators were asked to reduce their active 
power output to be able to produce more reactive output. 

P-Q and V-Q analysis by investigators determined that the low voltages and low reactive power margins 
in the Cleveland-Akron area on August 14 prior to the blackout could have led to a voltage collapse. In 
other words, the FE system in northeastern Ohio was near a voltage collapse on August 14, although that 
was not the cause of the blackout. 

The voltage profiles of the 345-kV network in the west-to-east and north-to-south directions were plotted 
from available SCADA data for selected buses. The locations of these buses are shown in Figures II.8 
and 11.9 respectively. They extend from Allen Junction, an FE interconnection point within ITC to the 
west, to Homer City in PJM to the east, and from St. Clair in ITC to the north to Cardinal-Tidd in AEP to 
the south. 

There are three observations that can be made from these voltage profiles: 

• The voltage profiles in both west-to-east and north-to-south directions display a dip at the center, 
with FE critical buses in the Cleveland-Akron area forming a low voltage cluster at Avon Lake, 
Harding, Juniper, Chamberlin, and Star. 

• Voltages were observed to be higher in the portions of the FE control area outside of the 
Cleveland-Akron area. Voltages bordering FE in adjacent control areas were observed to be 
higher still. The bus voltages outside the Cleveland-Akron area are consistently higher during the 
period leading up to August 14. 
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• The bus voltages in the Cleveland-Akron area show a greater decline as the week progressed 
compared to buses outside this area. 




Bus 3011 

Figure 11.9 — North-to-South Voltage Profile 
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Analysis showed that the declining voltages in the Cleveland-Akron area were strongly influenced by the 
increasing temperatures and loads in that area and minimally affected by transfers through FE to other 
systems. FE did not have sufficient reactive supply in the Cleveland-Akron area on August 14 to meet 
reactive power demands and maintain a safe margin from voltage collapse. 

2. FE Reactive Reserves 

Figure II. 10 shows the actual reactive power reserves from representative generators along the Lake Erie 
shore and in the Cleveland-Akron area for three time periods on August 14. It also shows the reactive 
power reserves from representative generators in AEP, MECS, and PJM that are located in the proximity 
of their interconnections with FE. 
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Representative Units Mvar Reactive Reserves at 
Approximately 1:00 pm EDT on August 14, 2003 
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at Approximately 4:00 pm EDT on August 14, 2003 
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Figure 11.10 — Reactive Reserves of Representative Groups of Generators on 

August 14, 2003 
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The following observations may be made: 

• Reactive power reserves from the FE generators located in the Cleveland-Akron area were 
consistently lower than those from generators in both neighboring systems and in the southern 
portion of the FE system. These reserves were less than the reactive capability of the Perry 
nuclear generating station, the largest generating unit in the area, meaning that if the Perry unit 
had tripped offline, the Cleveland-Akron area would have been depleted of any reactive power 
reserve. 

• The reactive reserves in the Cleveland-Akron area were progressively reduced as successive 
outages occurred on the afternoon of August 14. By 16:00, after numerous 138-kV line failures, 
the reserve margins in the Cleveland-Akron area were depleted. 

• Generators external to this area had ample reactive margins while maintaining their scheduled 
voltages, but that reactive power was unable to reach the Cleveland-Akron area due to the limited 
ability of reactive power to flow over long distances. These included the generator group located 
southeast of Akron, consisting of Sammis, Beaver Valley, and Mansfield. 

F. System Frequency 

Figure II. 11 shows a plot of the frequency for the Eastern Interconnection on August 14. As is typical, 
frequency is highly random within a narrow band of several one hundredths of a hertz. Prior to the 
blackout, frequency was within the statistical bounds of a typical day. Scheduled frequency was lowered 
to 59.98 at noon to conduct a time error correction. This is a routine operation. After the blackout, the 
frequency was high and highly variable following the loss of exports to the Northeast. Also, there 
appears to be a pattern relating to the times during which frequency deviations are larger. 



Time - EDT 

Figure 11.11 — Eastern Interconnection Frequency Plot for August 14, 2003 

System frequency anomalies earlier in the day on August 14 are explained by previously known 
interchange scheduling issues and were not a precursor to the blackout. Although frequency was 


July 13, 2004 


25 





















August 14, 2003, Blackout Section II 

Final NERC Report Conditions Prior to the Start of the Blackout Sequence 

somewhat variable on August 14, it was well within the bounds of safe operating practices as outlined in 
NERC operating policies and consistent with historical values. 

Large signals in the random oscillations of frequency were seen on August 14, but this was typical for 
most other days as well, indicating a need for attention to the effects of scheduling interchange on 
interconnection frequency. Frequency generally appeared to be running high, which is not by itself a 
problem, but indicates that there were insufficient resources to control frequency for the existing 
scheduling practices. This behavior indicates that frequency anomalies seen on August 14 prior to the 
blackout were caused by the ramping of generation around regular scheduling time blocks and were 
neither the cause of the blackout nor precursor signals of a system failure. The results of this 
investigation should help to analyze control performance in the future. 

G. Contingency Analysis of Conditions at 15:05 EDT on August 14 

A power flow base case was established for 15:05 on August 14 that encompassed the entire northern 
portion of the Eastern Interconnection. Investigators benchmarked the case to recorded system conditions 
at that time. The team started with a projected summer 2003 power flow case developed in the spring of 
2003 by the regional reliability councils. The level of detail involved in this region-wide study exceeded 
that normally considered by individual control areas and reliability coordinators. It consisted of a detailed 
representation of more than 44,300 buses, 59,086 transmission lines and transformers, and 6,987 major 
generators across the northern United States and eastern Canada. The team then revised the summer 
power flow case to match recorded generation, demand, and power interchange levels among control 
areas at 15:05 on August 14. The benchmarking consisted of matching the calculated voltages and line 
flows to recorded observations at more than 1,500 locations within the grid at 15:05. 

Once the base case was benchmarked, the team ran a contingency analysis that considered more than 800 
possible events as points of departure from the 15:05 case. None of these contingencies were found to 
result in a violation of a transmission line loading or bus voltage limit prior to the trip of the Chamberlin- 
Harding line in the FE system. According to these simulations, at 15:05, the system was able to continue 
to operate safely following the occurrence of any of the tested contingencies. From an electrical 
standpoint, the system was being operated within steady state limits at that time. Although the system 
was not in a reliable state with respect to reactive power margins, that deficiency did not cause the 
blackout. 
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III. Causal Events Leading to the Power System 
Cascade 

This section explains the major events — electrical, operational, and computer-related — leading 
up to and causing the blackout. The period covered in this section begins at 12:15 EDT on 
August 14, when missing information on the Cinergy Bloomington-Denois Creek 230-kV line 
initially rendered MISO’s state estimator ineffective. The section ends at 16:05:57 EDT on 
August 14, when the Sammis-Star 345-kV transmission line tripped, signaling the transition from 
a local event in northeastern Ohio to the start of an uncontrolled cascade that spread through 
much of northeastern North America. 

A. Event Summary 

At 13:31, the FE Eastlake 5 generating unit tripped offline due to an exciter failure while the 
operator was making voltage adjustments. Had Eastlake 5 remained in service, subsequent line 
loadings on the 345-kV paths into Cleveland would have been slightly lower and outages due to 
tree contacts might have been delayed; there is even a remote possibility that the line trips might 
not have occurred. Loss of Eastlake 5, however, did not cause the blackout. Analysis shows that 
the FE system was still operating within FE-defmed limits after the loss of Eastlake 5. 

Shortly after 14:14, the alarm and logging system in the FE control room failed and was not 
restored until after the blackout. Loss of this critical control center function was a key factor in 
the loss of situational awareness of system conditions by the FE operators. Unknown to the 
operators, the alarm application failure eventually spread to a failure of multiple energy 
management system servers and remote consoles, substantially degrading the capability of the 
operators to effectively monitor and control the FE system. At 14:27, the Star-South Canton 345- 
kV tie line between FE and AEP opened and reclosed. When AEP operators called a few minutes 
later to confirm the operation, the FE operators had no indication of the operation (since the 
alarms were out) and denied their system had a problem. This was the first clear indication of a 
loss of situational awareness by the FE operators. 

Between 15:05 and 15:42, three FE 345-kV transmission lines supplying the Cleveland-Akron 
area tripped and locked out because the lines contacted overgrown trees within their rights-of- 
way. At 15:05, while loaded at less than 45 percent of its rating, FE’s Chamberlin-Harding 345- 
kV line tripped and locked out. No alarms were received in the FE control room because of the 
alarm processor failure, and the operators’ loss of situational awareness had grown from not 
being aware of computer problems to not being aware of a major system problem. After 15:05, 
following the loss of the Chamberlin-Harding line, the power system was no longer able to 
sustain the next-worst contingency without overloading facilities above emergency ratings. 

The loss of two more key 345-kV lines in northern Ohio due to tree contacts shifted power flows 
onto the underlying network of 138-kV lines. These lines were not designed to carry such large 
amounts of power and quickly became overloaded. Concurrently, voltages began to degrade in 
the Akron area. As a result of the increased loading and decaying voltages, sixteen 138-kV lines 
tripped sequentially over a period of 30 minutes (from 15:39 to 16:09), in what can best be 
described as a cascading failure of the 138-kV system in northern Ohio. Several of these line 
trips were due to the heavily loaded lines sagging into vegetation, distribution wires, and other 
underlying objects. 
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Loss of the 138-kV paths, along with the previous loss of the 345-kV paths into Cleveland, 
overloaded the remaining major path into the area: the FE Sammis-Star 345-kV line. Sammis- 
Star tripped at 16:05:57, signaling the beginning of an uncontrollable cascade of the power 
system. The trip was a pivotal point between a localized problem in northeastern Ohio and what 
became a wide-area cascade affecting eight states and two provinces. The loss of the heavily 
overloaded Sammis-Star line instantly created major and unsustainable burdens on other lines, 
first causing a “domino-like” sequence of line outages westward and northward across Ohio and 
into Michigan, and then eastward, splitting New York from Pennsylvania and New Jersey. The 
cascade sequence after the Sammis-Star trip is described in Section IV. 

Although overgrown trees caused an unexpected rash of non-random line trips on the FE system, 
and FE operating personnel lost situational awareness, there could have been assistance from 
MISO, FE’s reliability coordinator, had it not been for lack of visual tools and computer problems 
there as well. The first sign of trouble came at 12:15, when MISO’s state estimator experienced 
an unacceptably large mismatch error between state-estimated values and measured values. The 
error was traced to an outage of Cinergy’s Bloomington-Denois Creek 230-kV line that was not 
updated in MISO’s state estimator. The line status was quickly corrected, but the MISO analyst 
forgot to reset the state estimator to run automatically every five minutes. 

At 14:02, DP&L’s Stuart-Atlanta 345-kV line tripped and locked out due to a tree contact. By 
the time the failure to reset the MISO state estimator to run automatically was discovered at 
14:40, the state estimator was missing data on the Stuart-Atlanta outage and, when finally reset, 
again failed to solve correctly. This combination of human error and ineffective updating of line 
status information to the MISO state estimator prevented the state estimator from operating 
correctly from 12:15 until 15:34. MISO’s real-time contingency analysis, which relies on state 
estimator input, was not operational until 16:04. During this entire time, MISO was unable to 
correctly identify the contingency overload that existed on the FE system after the Chamberlin- 
Harding line outage at 15:05, and could not recognize worsening conditions as the Hanna-Juniper 
and Star-South Canton lines also failed. MISO was still receiving data from FE during this 
period, but was not aware of the line trips. 

By around 15:46, when FE, MISO, and neighboring systems had begun to realize that the FE 
system was in serious jeopardy, the only practical action to prevent the blackout would have been 
to quickly drop load. Analysis indicated that at least 1,500 to 2,500 MW of load in the 
Cleveland-Akron area would have had to been shed. However, no such effort was made by the 
FE operators. They still lacked sufficient awareness of system conditions at that time and had no 
effective means to shed an adequate amount of load quickly. Furthermore, the investigation 
found that FE had not provided system operators with the capability to manually or automatically 
shed that amount of load in the Cleveland area in a matter of minutes, nor did it have operational 
procedures in place for such an action. 

B. Significant Events Prior to the Start of the Blackout 

1. Eastlake Unit 5 Trips at 13:31 EDT 

Eastlake Unit 5 is located in northern Ohio along the southern shore of Lake Erie. The 
unavailability of Eastlake 4 and Davis-Besse meant that FE had to import more energy into the 
Cleveland-Akron area to support its load. This also increased the importance of the Eastlake 5 
and Perry 1 units as resources in that area. 
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Throughout the morning, the EMS operators were calling the plants to request increases in 
reactive power. A key conversation took place between the EMS (system control center) operator 
and the Eastlake Unit 5 operator at approximately 13:16 on August 14: 

EMS Operator: “Eley, do you think you could help out the 345 voltage a little?” 

Eastlake 5 Operator: “Buddy, I am — yeah, I’ll push it to my max max. You’re only 

going to get a little bit.” 

EMS Operator: “That’s okay, that’s all I can ask.” 

The effects of the plant operator trying to go to “max max” at 13:16 are apparent in Figure III. 1. 
The reactive output rose above the assumed maximum for about four minutes. There is a slight 
step increase in the reactive output of the unit again. This increase is believed to correlate with 
the trip of a 138-kV capacitor bank in the FE system that field personnel were attempting to 
restore to service. The reactive output remains at this level for another three to four minutes and 
then the Automatic Voltage Regulator (AVR) tripped to manual operation and a set point that 
effectively brought the (gross) Mvar output of the unit to zero. When a unit at full MW load trips 
from AVR to manual control, the Mvar output should not be designed or set to decrease the Mvar 
output to zero. Normal practice is to decrease the exciter to the rated full load DC field current or 
a reasonable preset value. Subsequent investigation found that this unit was set incorrectly. 

About four or five minutes after the Mvar output decreased to zero, the operator was increasing 
the terminal voltage and attempting to place the exciter back on AVR control when the excitation 
system tripped altogether (see Figure III. 1). The unit then tripped off at 13:31:34 when the loss 
of excitation relay operated. Later phone transcripts indicate subsequent trouble with a pump 
valve at the plant that would not re-seat after the trip. As a result, the unit could not be quickly 
returned to service. 


MW / MVAr 


kV 



Figure 111.1 — Eastlake 5 Output Prior to Trip at 13:31 EDT 
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The excitation system failure not only tripped the Eastlake unit 5 — a critical unit in the 
Cleveland area — the effort to increase Eastlake 5 voltage did not produce the desired result. 
Rather, the result of trying to increase the reactive output of the Eastlake 5 generating unit, once 
the unit tripped, was a decrease in reactive support to the Cleveland-Akron area. 

At no time during the morning or early afternoon of August 14 did the FE operators indicate 
voltage problems or request any assistance from outside the FE control area for voltage support. 
FE did not report the loss of Eastlake Unit 5 to MISO. Further, MISO did not monitor system 
voltages; that responsibility was left to its member operating systems. 

When Eastlake 5 tripped, flows caused by replacement power transfers and the associated 
reactive power to support these additional imports into the area contributed to higher line 
loadings on the paths into the Cleveland area. At 15:00, FE load was approximately 12,080 MW, 
and FE was importing about 2,575 MW, or 21 percent of the total load. With imports this high, 
FE reactive power demands, already high due to the increasing air-conditioning loads that 
afternoon, were using up nearly all available reactive resources. 

Simulations indicate that the loss of Eastlake 5 was an electrically significant step in the 
sequence of events, although it was not a cause of the blackout. However, contingency analysis 
simulation of the conditions immediately following the loss of the Chamberlin-Harding 345-kV 
circuit at 15:05 shows that the system was unable to sustain the next worst contingency event 
without exceeding emergency ratings. In other words, with Eastlake 5 out of service, the FE 
system was in a first contingency limit violation after the loss of the Chamberlin-Harding 345-kV 
line. However, when Eastlake 5 was modeled as being in service, all contingency violations 
were eliminated, even after the loss of Chamberlin-Harding. 

FE operators did not access contingency analysis results at any time during the day on August 
14, nor did the operators routinely conduct such studies on shift. In particular, the operators did 
not use contingency analysis to evaluate the loss of Eastlake 5 at 13:31 to determine whether the 
loss of another line or generating unit would put their system at risk. FE operators also did not 
request or evaluate a contingency analysis after the loss of Chamberlin-Harding at 15:05 (in part 
because they did not know that it had tripped out of service). Thus, FE did not discover at 15:05, 
after the Chamberlin-Harding line trip, that their system was no longer within first contingency 
criteria and that operator action was needed to immediately begin correcting the situation. 

FE had a state estimator that ran automatically every 30 minutes. The state estimator solution 
served as a base from which to perform contingency analyses. Interviews of FE personnel 
indicate that the contingency analysis model was likely running on August 14, but it was not 
consulted at any point that afternoon. FE indicated that it had experienced problems with the 
automatic contingency analysis operation since the system was installed in 1995. As a result, the 
practice was for FE operators or engineers to run contingency analysis manually as needed. 

2. Stuart-Atlanta 345-kV Line Trips at 14:02 EDT 

The Stuart-Atlanta 345-kV line is in the DP&L control area. After the Stuart-Atlanta line tripped, 
DP&L did not immediately provide an update of a change in equipment status using a standard 
form that posts the status change in the NERC System Data Exchange (SDX). The SDX is a 
database that maintains information on grid equipment status and relays that information to 
reliability coordinators, control areas, and the NERC IDC. The SDX was not designed as a real¬ 
time information system, and DP&L was required to update the line status in the SDX within 24 
hours. MISO, however, was inappropriately using the SDX to update its real-time state estimator 
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model. On August 14, MISO checked the SDX to make sure that it had properly identified all 
available equipment and outages, but found no posting there regarding the Stuart-Atlanta outage. 

At 14:02:00 the Stuart-Atlanta line tripped and locked out due to contact with a tree. 

Investigators determined that the conductor had contacted five 20-25 feet tall Ailanthus trees, 
burning off the tops of the trees. There was no fire reported on the ground and no fire agencies 
were contacted, disproving claims that the outage had been caused by ionization of the air around 
the conductors induced by a ground fire. Investigation modeling reveals that the loss of the 
Stuart-Atlanta line had no adverse electrical effect on power flows and voltages in the FE area, 
either immediately after its trip or later that afternoon. The Stuart-Atlanta line outage is relevant 
to the blackout only because it contributed to the failure of MISO’s state estimator to operate 
effectively, and MISO was unable to provide adequate diagnostic support to FE until 16:04. 

3. Star-South Canton 345-kV Line Trip and Reclose 

At 14:27:16, while loaded at about 54 percent of its emergency ampere rating, the Star-South 
Canton 345-kV tie line (between AEP and FE) tripped and successfully reclosed. The digital 
fault recorder indicated a solid Phase C-to-ground fault near the FE Star station. The South 
Canton substation produced an alarm in AEP’s control room. However, due to the FE computer 
alarm system failure beginning at 14:14, the line trip and reclosure at FE Star substation were not 
alarmed at the FE control center. The FE operators had begun to lose situational awareness of 
events occurring on their system as early as 14:27, when the Star-South Canton line tripped 
momentarily and reclosed. Figure III.2 presents the initial events: the Eastlake 5 trip, the Stuart- 
Atlanta trip, and the Star-South Canton trip and reclose. 
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C. FE Computer System Failures: Loss of Situational Awareness 

1. Alarm Processor Failure at 14:14 EDT 

Starting around 14:14, FE control room operators lost the alarm function that provided audible 
and visual indications when a significant piece of equipment changed from an acceptable to 
problematic status. Analysis of the alarm problem performed by FE after the blackout suggests 
that the alarm processor essentially “stalled” while processing an alarm event. With the software 
unable to complete that alarm event and move to the next one, the alarm processor buffer filled 
and eventually overflowed. After 14:14, the FE control computer displays did not receive any 
further alarms, nor were any alarms being printed or posted on the EMS’s alarm logging 
facilities. 

FE operators relied heavily on the alarm processor for situational awareness, since they did not 
have any other large-scale visualization tool such as a dynamic map board. The operators would 
have been only partially handicapped without the alarm processor, had they known it had failed. 
However, by not knowing that they were operating without an alarm processor, the operators did 
not recognize system conditions were changing and were not receptive to information received 
later from MISO and neighboring systems. The operators were unaware that in this situation 
they needed to manually, and more closely, 
monitor and interpret the SCADA information 
they were receiving. 

Working under the assumption that their power 
system was in satisfactory condition and lacking 
any EMS alarms to the contrary, FE control room 
operators were surprised when they began 
receiving telephone calls from others — MISO, 

AEP, PJM, and FE field operations staff— who 
offered information on the status of FE 
transmission facilities that conflicted with the FE 
system operators’ understanding of the situation. 

The first hint to FE control room staff of any 
computer problems occurred at 14:19, when a 
caller and an FE control room operator discussed 
the fact that three sub-transmission center dial-ups 
had failed. At 14:25, a control room operator 
talked again with a caller about the failure of these 
three remote terminals. The next hint came at 
14:32, when FE scheduling staff spoke about 
having made schedule changes to update the EMS 
pages, but that the totals did not update. 

There is an entry in the FE western desk operator’s 
log at 14:14 referring to the loss of alarms, but it 
appears that entry was made after-the-fact, 
referring back to the time of the last known alarm. 


2 

Cause la : FE had no alarm failure 
detection system. Although the FE alarm 
processor stopped functioning properly at 
14:14, the computer support staff remained 
unaware of this failure until the second EMS 
server failed at 14:54, some 40 minutes later. 
Even at 14:54, the responding support staff 
understood only that all of the functions 
normally hosted by server H4 had failed, and 
did not realize that the alarm processor had 
failed 40 minutes earlier. Because FE had 
no periodic diagnostics to evaluate and 
report the state of the alarm processor, 
nothing about the eventual failure of two 
EMS servers would have directly alerted the 
support staff that the alarms had failed in an 
infinite loop lockup — or that the alarm 
processor had failed in this manner both 
earlier and independently of the server 
failure events. Even if the FE computer 
support staff had communicated the EMS 
failure to the operators (which they did not) 
and fully tested the critical functions after 
restoring the EMS (which they did not), 
there still would have been a minimum of 
40 minutes, from 14:14 to 14:54, during 
which the support staff was unaware of the 
alarm processor failure. 


2 Causes appear in chronological order. Their numbering, however, corresponds to overall categorization 
of causes summarized in Section V. 
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If any operator knew of the alarm processor failure prior to 15:42, there was no evidence from 
the phone recordings, interview transcripts, or written logs that the problem was discussed during 
that time with any other control room staff or with computer support staff. 

Although the alarm processing function failed, the remainder of the EMS continued to collect 
valid real-time status information and measurements for the FE power system, and continued to 
have supervisory control over the FE system. The FE control center continued to send its normal 
complement of information to other entities, including MISO and AEP. Thus, these other 
entities continued to receive accurate information about the status and condition of the FE power 
system, even past the point when the FE alarm processor failed. However, calls received later 
from these other entities did not begin to correct the FE operators’ loss of situational awareness 
until after 15:42. 

2. Remote Console Failures between 14:20 and 14:25 EDT 

Between 14:20 and 14:25, several FE remote control terminals in substations ceased to operate. 

FE advised the investigation team that it believes this occurred because the data feeding into those 
terminals started “queuing” and overloading the terminals’ buffers. FE system operators did not 
learn about the remote terminal failures until 14:36, when a technician at one of the sites noticed 
the terminal was not working after he came on early for the shift starting at 15:00 and called the 
main control room to report the problem. As remote terminals failed, each triggered an automatic 
page to FE computer support staff. The investigation team has not determined why some 
terminals failed whereas others did not. Transcripts indicate that data links to the remote sites 
were down as well. 

3. FE EMS Server Failures 

The FE EMS system includes several server nodes that perform the advanced EMS applications. 
Although any one of them can host all of the functions, normal FE system configuration is to 
have several host subsets of applications, with one server remaining in a “hot-standby” mode as a 
backup to the other servers, should any fail. At 14:41, the primary server hosting the EMS alarm 
processing application failed, due either to the stalling of the alarm application, the “queuing” to 
the remote terminals, or some combination of the two. Following pre-programmed instructions, 
the alarm system application and all other EMS software running on the first server automatically 
transferred (“failed-over”) onto the back-up server. However, because the alarm application 
moved intact onto the back-up while still stalled and ineffective, the back-up server failed 13 
minutes later, at 14:54. Accordingly, all of the EMS applications on these two servers stopped 
running. 

The concurrent loss of two EMS servers apparently caused several new problems for the FE 
EMS and the system operators using it. Tests run during FE’s after-the-fact analysis of the alarm 
failure event indicate that a concurrent absence of these servers can significantly slow down the 
rate at which the EMS refreshes displays on operators’ computer consoles. Thus, at times on 
August 14, operator screen refresh rates, normally one-to-three seconds, slowed to as long as 59 
seconds per screen. Since FE operators have numerous information screen options, and one or 
more screens are commonly “nested” as sub-screens from one or more top level screens, the 
operators’ primary tool for observing system conditions slowed to a frustrating crawl. This 
situation likely occurred between 14:54 and 15:08, when both servers failed, and again between 
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15:46 and 15:59, while FE computer support 
personnel attempted a “warm reboot” of both 
servers to remedy the alarm problem/ 

Loss of the first server caused an auto-page to 
be issued to alert the FE EMS computer support 
personnel to the problem. When the back-up 
server failed, it too sent an auto-page to FE 
computer support staff. At 15:08, the support 
staff completed a warm reboot. Although the 
FE computer support staff should have been 
aware that concurrent loss of its servers would 
mean the loss of alarm processing on the EMS, 
the investigation team has found no indication 
that the IT staff informed the control room staff 
either when they began work on the servers at 
14:54 or when they completed the primary 
server restart at 15:08. At 15:42, a member of 
the computer support staff was told of the alarm 
problem by a control room operator. FE has 
stated to investigators that their computer 
support staff had been unaware before then that 
the alarm processing sub-system of the EMS 
was not working. 

Startup diagnostics monitored during that warm 
reboot verified that the computer and all 
expected processes were running. Accordingly, 
the FE computer support staff believed that they 
had successfully restarted the node and all the 
processes it was hosting. However, although 
the server and its applications were again 
running, the alarm system remained frozen and 
non-functional, even on the restarted computer. 

The computer support staff did not confirm with 
the control room operators that the alarm system 
was again working properly. 

Another casualty of the loss of both servers was 
the Automatic Generation Control (AGC) function hosted on those computers. Loss of AGC 
meant that FE operators could not manage affiliated power plants on pre-set programs to respond 
automatically to meet FE system load and interchange obligations. Although the AGC did not 


Cause lb: FE computer support staff did 
not effectively communicate the loss of 
alarm functionality to the FE system 
operators after the alarm processor failed at 
14:14, nor did they have a formal procedure 
to do so. Knowing the alarm processor had 
failed would have provided FE operators the 
opportunity to detect the Chamberlin-Harding 
line outage shortly after 15:05 using 
supervisory displays still available in their 
energy management system. Knowledge of 
the Chamberlin-Harding line outage would 
have enabled FE operators to recognize 
worsening conditions on the FE system and to 
consider manually reclosing the Chamberlin- 
Harding line as an emergency action after 
subsequent outages of the Hanna-Juniper and 
Star-South Canton 345-kV lines. Knowledge 
of the alarm processor failure would have 
allowed the FE operators to be more receptive 
to information being received from MISO and 
neighboring systems regarding degrading 
conditions on the FE system. This know¬ 
ledge would also have allowed FE operators to 
warn MISO and neighboring systems of 
the loss of a critical monitoring function in the 
FE control center computers, putting them on 
alert to more closely monitor conditions on the 
FE system, although there is not a specific 
procedure requiring FE to warn MISO of a 
loss of a critical control center function. The 
FE operators were complicit in this deficiency 
by not recognizing the alarm processor failure 
existed, although no new alarms were received 
by the operators after 14:14. A period of more 
than 90 minutes elapsed before the operators 
began to suspect a loss of the alarm processor, 
a period in which, on a typical day, scores of 
routine alarms would be expected to print to 
the alarm logger. 


3 A cold reboot of the XA21 system is one in which all nodes (computers, consoles, etc.) of the system are 
shut down and then restarted. Alternatively, a given XA21 node can be warm rebooted whereby only that 
node is shut down and restarted. A cold reboot will take significantly longer to perform than a warm one. 
Also, during a cold reboot, much more of the system is unavailable for use by the control room operators 
for visibility or control over the power system. Warm reboots are not uncommon, whereas cold reboots are 
rare. All reboots undertaken by FE computer support personnel on August 14 were warm reboots. A cold 
reboot was done in the early morning of August 15, which corrected the alarm problem. 
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work from 14:54 to 15:08 and again from 15:46 to 
15:59 (periods when both servers were down), this 
loss of functionality does not appear to have had 
any causal effect on the blackout. 

The concurrent loss of the EMS servers also 
caused the failure of the FE strip chart function. 

Numerous strip charts are visible in the FE control 
room, driven by the EMS computers. They show 
a variety of system conditions including raw ACE 
(Area Control Error), FE system load, and 
Sammis-South Canton and Star-South Canton line 
loadings. The chart recorders continued to scroll 
but, because the underlying computer system was 
locked up, the chart pens showed only the last 
valid measurement recorded, without any variation 
from that measurement as time progressed; i.e., 
the charts “flat-lined.” There is no indication that 
any operator noticed or reported the failed 
operation of the charts. The few charts fed by 
direct analog telemetry, rather than the EMS 
system, showed primarily frequency data and 
remained available throughout the afternoon of 
August 14. 

Without an effective EMS, the only remaining 
ways to monitor system conditions would have 
been through telephone calls and direct analog 
telemetry. FE control room personnel did not 
realize that the alarm processing on the EMS was 
not working and, subsequently, did not monitor 
other available telemetry. Shortly after 14:14 
when their EMS alarms failed, and until at least 15:42 when they began to recognize the gravity 
of their situation, FE operators did not understand how much of their system was being lost and 
did not realize the degree to which their perception of system conditions was in error, despite 
receiving clues via phone calls from AEP, PJM, MISO, and customers. The FE operators were 
not aware of line outages that occurred after the trip of Eastlake 5 at 13:31 until approximately 
15:45, although they were beginning to get external input describing aspects of the system’s 
weakening condition. Since FE operators were not aware and did not recognize events as they 
were occurring, they took no actions to return the system to a reliable state. Unknowingly, they 
used the outdated system condition information they did have to discount information received 
from others about growing system problems. 

4. FE EMS History 

The EMS in service at the FE Ohio control center is a GE Harris (now GE Network Systems) 
XA21 system. It was initially brought into service in 1995. Other than the application of minor 
software fixes or patches typically encountered in the ongoing maintenance and support of such a 
system, the last major updates to this EMS were made in 1998, although more recent updates 
were available from the vendor. On August 14, the system was not running the most recent 


Cause lc: FE control center computer 
support staff did not fully test the 
functionality of applications, including the 
alarm processor, after a server failover and 
restore. After the FE computer support staff 
conducted a warm reboot of the energy 
management system to get the failed servers 
operating again, they did not con-duct a 
sufficiently rigorous test of critical energy 
management system applications to 
determine that the alarm processor failure 
still existed. Full testing of all critical 
energy management functions after 
restoring the servers would have detected 
the alarm processor failure as early as 15:08 
and would have cued the FE system 
operators to use an alternate means to 
monitor system conditions. Knowledge that 
the alarm processor was still failed after the 
server was restored would have enabled 
FE operators to proactively monitor sys-tem 
conditions, become aware of the line 
outages occurring on the system, and act on 
operational information that was received. 
Knowledge of the alarm processor failure 
would also have allowed FE operators to 
warn MISO and neighboring systems, 
assuming there was a procedure to do so, of 
the loss of a critical monitoring function in 
the FE control center computers, putting 
them on alert to more closely monitor 
conditions on the FE system. 
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release of the XA21 software. FE had decided well 
before then to replace its XA21 system with an EMS 
from another vendor. 

FE personnel told the investigation team that the 
alarm processing application had failed on several 
occasions prior to August 14, leading to loss of the 
alarming of system conditions and events for FE 
operators. This was, however, the first time the 
alarm processor had failed in this particular mode in 
which the alarm processor completely locked up due 
to XA21 code errors. FE computer support personnel 
neither recognized nor knew how to correct the alarm 
processor lock-up. FE staff told investigators that it 
was only during a post-outage support call with GE 
late on August 14 that FE and GE determined that the 
only available course of action to correct the alarm 
problem was a cold reboot of the XA21 system. In 
interviews immediately after the blackout, FE 
computer support personnel indicated that they discussed a cold reboot of the XA21 system with 
control room operators after they were told of the alarm problem at 15:42. However, the support 
staff decided not to take such action because the operators considered power system conditions to 
be precarious and operators were concerned about the length of time that the reboot might take 
and the reduced capability they would have until it was completed. 

D. The MISO State Estimator Is Ineffective from 12:15 to 16:04 
EDT 

It is co mm on for reliability coordinators and control areas to use a state estimator to monitor the 
power system to improve the accuracy over raw telemetered data. The raw data are processed 
mathematically to make a “best fit” power flow model, which can then be used in other software 
applications, such as real-time contingency analysis, to simulate various conditions and outages to 
evaluate the reliability of the power system. Real-time contingency analysis is used to alert 
operators if the system is operating insecurely; it can be run either on a regular schedule (e.g., 
every five minutes), when triggered by some system event (e.g., the loss of a power plant or 
transmission line), or when initiated by an operator. MISO usually runs its state estimator every 
five minutes and contingency analysis less frequently. If the model does not have accurate and 
timely information about key facilities, then the state estimator may be unable to reach a solution 
or it will reach a solution that is labeled as having a high degree of error. On August 14, MISO’s 
state estimator and real-time contingency analysis tools were still under development and not fully 
mature. At about 12:15, MISO’s state estimator produced a solution with a large mismatch 
outside the acceptable tolerance. This was traced to the outage at 12:12:47 of Cinergy’s 
Bloomington-Denois Creek 230-kV line. This line tripped out due to a sleeve failure. Although 
this line was out of service, its status was not updated in the state estimator. 

Line status information within MISO’s reliability coordination area is transmitted to MISO by the 
ECAR data network or direct links intended to be automatically linked to the state estimator. This 
requires coordinated data naming as well as instructions that link the data to the tools. For the 
Bloomington-Denois Creek line, the automatic linkage of line status to the state estimator had not 
yet been established. The line status was corrected manually and MISO’s analyst obtained a good 


Cause Id: FE operators did not have an 
effective alternative to easily visualize 
the overall conditions of the system once 
the alarm processor failed. An 
alternative means of readily visualizing 
overall system conditions, including the 
status of critical facilities, would have 
enabled FE operators to become aware of 
forced line outages in a timely manner 
even though the alarms were non¬ 
functional. Typically, a dynamic map 
board or other type of display could 
provide a system status overview for 
quick and easy recognition by the 
operators. As with the prior causes, this 
deficiency precluded FE operators from 
detecting the degrading system 
conditions, taking corrective actions, and 
alerting MISO and neighboring systems. 
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state estimator solution at about 13:00 and a 
real-time contingency analysis solution at 
13:07. However, to troubleshoot this 
problem, he had turned off the automatic 
trigger that runs the state estimator every 
five minutes. After fixing the problem he 
forgot to re-enable the automatic trigger. 

So, although he had manually run the state 
estimator and real-time contingency analysis 
to reach a set of correct system analyses, the 
tools were not returned to normal automatic 
operation. Thinking the system had been 
successfully restored, the analyst went to 
lunch. 

The fact that the state estimator was not 
running automatically on its regular five- 
minute schedule was discovered at about 
14:40. The automatic trigger was re-enabled 
but again the state estimator failed to solve 
successfully. This time, the investigation 
identified the Stuart-Atlanta 345-kV line 
outage at 14:02 to be the likely cause. This 
line, jointly owned by DP&L and AEP, is 
monitored by DP&L. The line is under 
PJM’s reliability umbrella rather than 
MISO’s. Even though it affects electrical 
flows within MISO and could stall MISO’s 
state estimator, the line’s status had not been 
automatically linked to MISO’s state 
estimator. 


The discrepancy between actual measured system flows (with Stuart-Atlanta out of service) and 
the MISO model (which assumed Stuart-Atlanta was in service) was still preventing the state 
estimator from solving correctly at 15:09 when, informed by the system engineer that the Stuart- 
Atlanta line appeared to be the problem, the MISO operator said (mistakenly) that this line was 
in service. The system engineer then tried unsuccessfully to reach a solution with the Stuart- 
Atlanta line modeled as in service until approximately 15:29, when the MISO reliability 
coordinator called PJM to verily the correct status. After the reliability coordinators determined 
that Stuart-Atlanta had tripped, MISO updated the state estimator and it solved correctly. The 
real-time contingency analysis was then run manually and solved successfully at 15:41. MISO’s 
state estimator and contingency analysis were back under full automatic operation and solving 
effectively by 16:04, about two minutes before the trip of the Sammis-Star line and initiation of 
the cascade. 

In summary, the MISO state estimator and real-time contingency analysis tools were effectively 
out of service between 12:15 and 15:41 and were not in full automatic operation until 16:04. 

This prevented MISO from promptly performing pre-contingency “early warning” assessments 
of power system reliability during the afternoon of August 14. MISO’s ineffective diagnostic 
support contributed to FE’s lack of situational awareness. 


Cause 3a: MISO was using non-real-lime 
information to monitor real-time operations in its 
area of responsibility. MISO was using its 
Flowgate Monitoring Tool (FMT) as an alternative 
method of observing the real-time status of critical 
facilities within its area of responsibility. 

However, the FMT was receiving information on 
facility outages from the NERC SDX, which is not 
intend-ed as a real-time information system and is 
not required to be updated in real-time. Therefore, 
without real-time outage information, the MISO 
FMT was unable to accurately estimate real-time 
conditions within the MISO area of responsibility. 
If the FMT had received accurate line outage 
distribution factors representing current system 
topology, it would have identified a contingency 
overload on the Star-Juniper 345-kV line for the 
loss of the Hanna-Juniper 345-kV line as early as 
15:10. This information would have enabled 
MISO to alert FE operators regarding the 
contingency violation and would have allowed 
corrective actions by FE and MISO. The reliance 
on non-real-time facility status information from 
the NERC SDX is not limited to MISO; others in 
the Eastern Interconnection use the same SDX 
information to calculate TLR curtailments in the 
IDC and make operational decisions on that basis. 
What was unique compared to other reliability 
coordinators on that day was MISO’s reliance on 
the SDX for what they intended to be a real-time 
system monitoring tool. 
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E. Precipitating Events: 345-kV Transmission Line Trips: 

15:05 to 15:41 EDT 

1. Summary 

From 15:05:41 to 15:41:35, three 345-kV lines failed with power flows at or below each 
transmission line’s emergency rating. Each trip and lockout was the result of a contact between 
an energized line and a tree that had grown so tall that it had encroached into the minimum safe 
clearance for the line. As each line failed, power flow was shifted onto the remaining lines. As 
each of the transmission lines failed and power flows shifted to other transmission paths, voltages 
on the rest of FE system degraded further. The following key events occurred during this period: 

• 15:05:41: The Chamberlin-Harding 345-kV line tripped, reclosed, tripped again, and 
locked out. 

• 15:31-33: MISO called PJM to determine if PJM had seen the Stuart-Atlanta 345-kV line 
outage. PJM confirmed Stuart-Atlanta was out. 

• 15:32:03: The Hanna-Juniper 345-kV line tripped, reclosed, tripped again, and locked 
out. 

• 15:35: AEP asked PJM to begin work on a 350 MW TLR to relieve overloading on the 
Star-South Canton line, not knowing the Hanna-Juniper 345-kV line had already tripped 
at 15:32. 

• 15:36: MISO called FE regarding a post-contingency overload on the Star-Juniper 345- 
kV line for the contingency loss of the Hanna-Juniper 345-kV line, unaware at the start of 
the call that Hanna-Juniper had already tripped. MISO used the FMT to arrive at this 
assessment. 

• 15:41:33—35: The Star-South Canton 345-kV line tripped, reclosed, tripped again at 
15:41, and remained out of service, while AEP and PJM were discussing TLR relief 
options to reduce loading on the line. 

2. Chamberlin-Harding 345-kV Line Trip at 15:05 EDT 

Figure III.3 shows the location of the Chamberlin-Harding line and the two subsequent critical 
line trips. 
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Figure III.3 — Location of Three Line Trips 


At 15:05:41, The FE Chamberlin-Harding 345-kV line tripped and locked out while loaded at 500 
MVA or 43.5 percent of its normal and emergency ratings of 1,195 MVA (which are the same 
values). At 43.5 percent loading, the conductor temperature did not exceed its design temperature 
and the line could not have sagged sufficiently to allow investigators to conclude that the line 
sagged into the tree due to overload. Instead, investigators determined that FE had allowed trees 
in the Chamberlin-Harding right-of-way to grow too tall and encroach into the minimum safe 
clearance from a 345-kV energized conductor. The investigation team examined the relay data 
for this trip, which indicated high impedance Phase C fault-to-ground, and identified the 
geographic location of the fault. They determined that the relay data match the classic signature 
pattern for a tree-to-line fault (Figure III.4). Chamberlin-Harding tripped on directional ground 
relay — part of a directional comparison relay scheme protecting the line. 



Figure III.4 — Juniper DFR Indication of Tree Contact for Loss of the 

Chamberlin-Harding Line 
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Going to the fault location as determined from 
the relay data, the field team found the remains 
of trees and brush. At this location, conductor 
height measured 46 feet 7 inches, while the 
height of the felled tree measured 42 feet; 
however, portions of the tree had been removed 
from the site. This means that while it is 
difficult to determine the exact height of the line 
contact, the measured height is a minimum and 
the actual contact was likely three to four feet 
higher than estimated here. Bum marks were 
observed 35 feet 8 inches up the tree, and the 
crown of this tree was at least six feet taller than 
the observed bum marks. The tree showed 
evidence of fault current damage. 

To be sure that the evidence of tree-to-line 
contacts and tree remains found at each site were 
linked to the events of August 14, the team 
looked at whether these lines had any prior 
history of outages in preceding months or years 
that might have resulted in the bum marks, 
debarking, and other evidence of line-tree 
contacts. Records establish that there were no 
prior sustained outages known to be caused by 
trees for these lines in 2001, 2002, or 2003. 
Chamberlin-Harding had zero outages for those 
years. Hanna-Juniper had six outages in 2001, 
ranging from four minutes to a maximum of 34 
minutes — two were from an unknown cause, 
one was caused by lightning, and three were 
caused by a relay failure or mis-operation. Star- 
South Canton had no outages in that same two- 
and-a-half year period. 


Cause 2: FE did not effectively manage 
vegetation in its transmission line rights-of- 
way. The lack of situational awareness 
resulting from Causes la 1 e would have 
allowed a number of system failure modes to 
go undetected. However, it was the fact that 
FE allowed trees growing in its 345-kV 
transmission rights-of-way to encroach 
within the minimum safe clearances from 
energized conductors that caused the 
Chamberlin-Harding, Hanna-Juniper, and 
Star-South Canton 345-kV line outages. 
These three tree-related outages triggered the 
localized cascade of the Cleveland-Akron 
138-kV system and the over-loading and 
tripping of the Sammis-Star line, eventually 
snowballing into an uncontrolled wide-area 
cascade. These three lines experienced non- 
random, common mode failures due to un¬ 
checked tree growth. With properly cleared 
rights-of-way and calm weather, such as 
existed in Ohio on August 14, the chances of 
those three lines randomly tripping within 30 
minutes is extremely small. Effective 
vegetation management practices would have 
avoided this particular sequence of line 
outages that triggered the blackout. 

However, effective vegetation management 
might not have precluded other latent failure 
modes. For example, investigators deter¬ 
mined that there was an elevated risk of a 
voltage collapse in the Cleveland-Akron area 
on August 14 if the Perry 1 nuclear plant had 
tripped that afternoon in addition to Eastlake 
5, because the transmission system in the 
Cleveland-Akron area was being operated 
with low bus voltages and insufficient 
reactive power margins to remain stable 
following the loss of Perry 1. 


Like most transmission owners, FE patrols its 
lines regularly, flying over each transmission 
line twice a year to check on the condition of the 


rights-of-way. Notes from flyovers in 2001 and 

2002 indicate that the examiners saw a significant number of trees and brush that needed clearing 
or trimming along many FE transmission lines. 

FE operators were not aware that the system was operating outside first contingency limits after 
the Chamberlin-Harding trip (for the possible loss of Hanna-Juniper), because they did not 
conduct a contingency analysis. The investigation team has not determined whether the system 
status information used by the FE state estimator and contingency analysis model was being 
accurately updated. 

Chamberlin-Harding was not one of the flowgates that MISO monitored as a key transmission 
location, so the reliability coordinator was unaware when FE’s first 345-kV line failed. Although 
MISO received SCADA input of the line’s status change, this was presented to MISO operators 
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as breaker status changes rather than a line failure. 

Because their EMS system topology processor had 
not yet been linked to recognize line failures, it did 
not connect the breaker information to the loss of a 
transmission line. Thus, MISO’s operators did not 
recognize the Chamberlin-Harding trip as a 
significant event and could not advise FE 
regarding the event or its consequences. 

Further, without its state estimator and associated 
real-time contingency analysis, MISO was unable 
to identify potential overloads that would occur 
due to various line or equipment outages. 

Accordingly, when the Chamberlin-Harding 345- 
kV line tripped at 15:05, the state estimator did not 
produce results and could not predict an overload 
if the Hanna-Juniper 345-kV line were also to fail. 

MISO did not discover that Chamberlin-Harding 
had tripped until after the blackout, when MISO 
reviewed the breaker operation log that evening. 

FE indicates that it discovered the line was out 
while investigating system conditions in response 
to MISO’s call at 15:36, when MISO told FE that 
MISO’s flowgate monitoring tool showed a Star- 
Juniper line overload following a contingency loss 
of Hanna-Juniper. However, investigators found 
no evidence within the control room logs or 
transcripts to show that FE knew of the 
Chamberlin-Harding line failure until after the blackout. 

When the Chamberlin-Harding line locked out, the loss of this path caused the remaining three 
345-kV paths into Cleveland from the south to pick up more load, with Hanna-Juniper picking up 
the most. The Chamberlin-Harding outage also caused more power to flow through the 
underlying 138-kV system. 

3. FE Hanna-Juniper 345-kV Line Trips at 15:32 EDT 

Incremental line current and temperature increases, escalated by the loss of Chamberlin-Harding, 
caused enough sag on the Hanna-Juniper line that it experienced a fault current due to tree 
contact, tripped and locked out at 15:32:03, with current flow at 2,050 amperes or 87.5 percent of 
its normal and emergency line rating of 2,344 amperes. Figure III.5 shows the Juniper digital 
fault recorder indicating the tree signature of a high-impedance ground fault. Analysis showed 
high arc resistance limiting the actual fault current well below the calculated fault current 
assuming a “bolted” (no arc resistance) fault. 


Cause le: FE did not have an effective 
contingency analysis capability cycling 
periodically on-line and did not have a 
practice of running contingency analysis 
manually as an effective alternative for 
identifying contingency limit violations. 
Real-time contingency analysis, cycling 
automatically every 5-15 minutes, would 
have alerted the FE operators to degraded 
system conditions following the loss of 
the Eastlake 5 generating unit and the 
Chamberlin-Harding 345-kV line. 
Initiating a manual contingency analysis 
after the trip of the Chamberlin-Harding 
line could also have identified the 
degraded system conditions for the FE 
operators. Know-ledge of a contingency 
limit violation after the loss of 
Chamberlin-Harding and know-ledge that 
conditions continued to worsen with the 
subsequent line losses would have allowed 
the FE operators to take corrective actions 
and notify MISO and neighboring systems 
of the developing system emergency. FE 
was operating after the trip of the 
Chamberlin-Harding 345-kV line at 15:05, 
such that the loss of the Perry 1 nuclear 
unit would have caused one or more lines 
to exceed their emergency ratings. 
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Figure III.5 — Juniper DFR Indication of Tree Contact for Loss of Hanna-Juniper 

The tree contact occurred on the south phase of the Hanna-Juniper line, which is lower than the 
center phase due to construction design. Although little evidence remained of the tree during the 
field visit in October, the team observed a tree stump 14 inches in diameter at its ground line and 
talked to a member of the FE tree-trimming crew who witnessed the contact on August 14 and 
reported it to the FE operators. FE was conducting right-of-way vegetation maintenance and the 
tree crew at Hanna-Juniper was three spans away clearing vegetation near the line, when the 
contact occurred on August 14. FE provided photographs that clearly indicate that the tree was of 
excessive height. Similar trees nearby but not in the right-of-way were 18 inches in diameter at 
ground line and 60 feet in height. Further inspection showed at least 20 trees growing in this 
right-of-way. 

When the Hanna-Juniper line tripped at 15:32:03, the Harding-Juniper 345-kV line tripped 
concurrently. Investigators believe the Harding-Juniper operation was an overtrip caused by a 
damaged coaxial cable that prevented the transmission of a blocking signal from the Juniper end 
of the line. Then the Harding-Juniper line automatically initiated a high-speed reclosure of both 
ring bus breakers at Juniper and one ring bus breaker at Harding. The A-Phase pole on the 
Harding breaker failed to reclose. This caused unbalanced current to flow in the system until the 
second Harding breaker reclosed automatically 7.5 seconds later. 

Hanna-Juniper was loaded at 87.5 percent of its normal and emergency rating when it tripped. 
With this line open, almost 1,200 MVA had to find a new path to reach the loads in Cleveland. 
Loading on the remaining two 345-kV lines increased, with Star-Juniper taking the most of the 
power shift. This caused the loading on Star-South Canton to rise above normal but within its 
emergency rating, and pushed more power onto the 138-kV system. Flows west into Michigan 
decreased slightly and voltages declined somewhat in the Cleveland area. 

Because its alarm system was not working, FE was not aware of the Chamberlin-Harding or 
Hanna-Juniper line trips. However, once MISO manually updated the state estimator model for 
the Stuart-Atlanta line outage, the software successfully completed a state estimation and 
contingency analysis at 15:41. But this left a 36-minute period, from 15:05 to 15:41, during 
which MISO did not anticipate the potential consequences of the Hanna-Juniper loss, and FE 
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operators knew of neither the line’s loss nor its 
consequences. PJM and AEP recognized the 
overload on Star-South Canton, but had not 
expected it because their earlier contingency 
analysis did not examine enough lines within 
the FE system to foresee this result of the 
Hanna-Juniper contingency on top of the 
Chamberlin-Harding outage. 

According to interviews, AEP had a 
contingency analysis capability that covered 
lines into Star. The AEP operator identified a 
problem for Star-South Canton overloads for a 
Sammis-Star line loss at about 15:33 and, at 
15:35, asked PJM to begin developing a 350 
MW TLR to mitigate it. The TLR was to 
relieve the actual overload above the normal 
rating then occurring on Star-South Canton, 
and to prevent an overload above the 
emergency rating on that line for the loss of 
Sammis-Star. But when they began working 
on the TLR, neither AEP nor PJM realized that 
Hanna-Juniper had already tripped at 15:32, 
further degrading system conditions. Most 
TLRs are for cuts of 25 to 50 MW. A 350 
MW TLR request was highly unusual and the 
operators were attempting to confirm why so 
much relief was suddenly required before 
implementing the requested TLR. 

Less than ten minutes elapsed between the loss of Hanna-Juniper, the overload above the normal 
limits of Star-South Canton, and the Star-South Canton trip and lock-out. This shortened time 
span between the Hanna-Juniper and Star-South Canton line trips is a first hint that the pace of 
events was beginning to accelerate. This activity between AEP and PJM was the second time on 
August 14 an attempt was made to remove actual and contingency overloads using an 
administrative congestion management procedure (reallocation of transmission through TLR) 
rather than directly ordering generator shifts to relieve system overloads first. The prior incident 
was the TLR activity between Cinergy and MISO for overloads on the Cinergy system. 

The primary means MISO was using to assess reliability on key flowgates was its flowgate 
monitoring tool. After the Chamberlin-Harding 345-kV line outage at 15:05, the FMT produced 
incorrect results because the outage was not reflected in the model. As a result, the tool assumed 
that Chamberlin-Harding was still available and did not predict an overload for the loss of the 
Hanna-Juniper 345-kV line. 

When Hanna-Juniper tripped at 15:32, the resulting overload was detected by MISO SCADA and 
set off alarms to MISO’s system operators, who then phoned FE about it. Because both MISO’s 
state estimator was still in a developmental state, was not working properly, and the flowgate 
monitoring tool did not have updated line status information, MISO’s ability to recognize 
evolving contingency situations on the FE system was impaired. 


Cause 3b: MISO did not have real-time 
topology information for critical lines mapped 
into its state estimator. The MISO state 
estimator and network analysis tools were still 
considered to be in development on August 14 
and were not fully capable of automatically 
recognizing changes in the configuration of the 
modeled system. Following the trip of lines in 
the Cinergy system at 12:12 and the DP&L 
Stuart-Atlanta line at 14:02, the MISO state 
estimator failed to solve correctly as a result of 
large numerical mismatches. MISO real-time 
contingency analysis, which operates only if the 
state estimator solves, did not operate properly 
in automatic mode again until after the black¬ 
out. Without real-time contingency analysis 
information, the MISO operators did not detect 
that the FE system was in a contingency 
violation after the Chamberlin-Harding 345-kV 
line tripped at 15:05. Since MISO was not 
aware of the contingency violation, MISO did 
not inform FE and thus FE’s lack of situational 
awareness described in Causes 1 a-e was allowed 
to continue. With an operational state estimator 
and real-time contingency analysis, MISO 
operators would have known of the contingency 
violation and could have informed FE, thus 
enabling FE and MISO to take timely actions to 
return the system to within limits. 
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Although an inaccuracy was identified with MISO’s flowgate monitoring tool, it still functioned 
with reasonable accuracy and prompted MISO to call FE to discuss the Hanna-Juniper line 
problem. The FMT showed an overload at 108 percent of the operating limit. However, the tool 
did not recognize that Chamberlin-Harding was out of service at the time. If the distribution 
factors had been updated, the overload would have appeared to be even greater. It would not 
have identified problems south of Star since that was not part of the flowgate and thus was not 
modeled in MISO’s flowgate monitor. 

4. Loss of the Star-South Canton 345-kV Line at 15:41 EDT 

The Star-South Canton line crosses the boundary between FE and AEP; each company owns the 
portion of the line within its service territory and manages the right-of-way for that portion. The 
Star-South C anton line tripped and reclosed three times on the afternoon of August 14, first at 
14:27:15 while operating at less than 55 percent of its rating. With the loss of Chamberlin- 
Harding and Hanna-Juniper, there was a substantial increase in load on the Star-South Canton 
line. This line, which had relayed and reclosed earlier in the afternoon at 14:27, again relayed 
and reclosed at 15:38:47. It later relayed and locked out in a two-second sequence from 15:41:33 
to 15:41:35 on a Phase C ground fault. Subsequent investigation found substantial evidence of 
tree contact. It should be noted that this fault did not have the typical signature of a high 
impedance ground fault. 

Following the first trip of the Star-South Canton line at 14:27, AEP called FE at 14:32 to discuss 
the trip and reclose of the line. AEP was aware of breaker operations at their end (South Canton) 
and asked about operations at FE’s Star end. FE indicated they had seen nothing at their end of 
the line but AEP reiterated that the trip occurred at 14:27 and that the South Canton breakers had 
reclosed successfully. 

There was an internal FE conversation about the AEP call at 14:51 expressing concern that they 
had not seen any indication of an operation; but, lacking evidence within their control room, the 
FE operators did not pursue the issue. According to the transcripts, FE operators dismissed the 
information as either not accurate or not relevant to their system, without following up on the 
discrepancy between the AEP event and the information observed in the FE control room. There 
was no subsequent verification of conditions with MISO. Missing the trip and reclose of the Star- 
South Canton at 14:27, despite a call from AEP inquiring about it, was a clear indication that the 
FE operators’ loss of situational awareness had begun. 

At 15:19, AEP called FE back to confirm that the Star-South Canton trip had occurred and that an 
AEP technician had confirmed the relay operation at South Canton. The FE operator restated that 
because they had received no trouble alarms, they saw no problem. At 15:20, AEP decided to 
treat the South Canton digital fault recorder and relay target information as a spurious relay 
operation and to check the carrier relays to determine what the problem might be. 

A second trip and reclose of Star-South Canton occurred at 15:38:48. Finally, at 15:41:35, the 
line tripped and locked out at the Star substation. A short-circuit-to-ground occurred in each 
case. Less than ten minutes after the Hanna-Juniper line trip at 15:32, Star-South Canton tripped 
with power flow at 93.2 percent of its emergency rating. AEP had called FE three times between 
the initial trip at 14:27 and 15:45 to determine if FE knew the cause of the line trips. 

Investigators inspected the right-of-way at the location indicated by the relay digital fault 
recorders, which was in the FE portion of the line. They found debris from trees and vegetation 
that had been felled. At this location, the conductor height was 44 feet 9 inches. The identifiable 
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tree remains measured 30 feet in height, although the team could not verify the location of the 
stump, nor find all sections of the tree. A nearby cluster of trees showed significant fault damage, 
including charred limbs and de-barking from fault current. Topsoil in the area of the tree trunk 
was also disturbed, discolored and broken up, a common indication of a higher magnitude fault or 
multiple faults. Analysis of another stump showed that a fourteen year-old tree had recently been 
removed from the middle of the right-of-way. 

It was only after AEP notified FE that the Star-South Canton 345-kV circuit had tripped and 
locked out at 15:42 did the FE control area operator compare this information to the breaker 
statuses for their end of the line at Star. After 15:42, the FE operator failed to immediately 
inform the MISO and adjacent control areas when they became aware that system conditions had 
changed due to unscheduled equipment outages that might affect other control areas. 

After the Star-South Canton line was lost, flows increased greatly on the 138-kV system toward 
Cleveland, and the Akron area voltage levels began to degrade on the 138-kV and 69-kV system. 
At the same time, power flow was increasing on the Sammis-Star line due to the 138-kV line trips 
and the dwindling number of remaining transmission paths into Cleveland from the south. 

5. Degrading System Conditions After the 345-kV Line Trips 

Figure III.6 shows the line loadings calculated by the investigation team as the 345-kV lines in 
northeast Ohio began to trip. Showing line loadings on the 345-kV lines as a percent of normal 
rating, the graph tracks how the loading on each line increased as each subsequent 345-kV and 
138-kV line tripped out of service between 15:05 (Chamberlin-Harding) and 16:06 (Dale-West 
Canton). As the graph shows, none of the 345- or 138-kV lines exceeded their normal ratings on 
an actual basis (although contingency overloads existed) until after the combined trips of 
Chamberlin-Harding and Hanna-Juniper. But immediately after Hanna-Juniper was lost, Star- 
South Canton’s loading jumped from an estimated 82 percent of normal to 120 percent of normal 
(still below its emergency rating) and remained at that level for ten minutes before tripping out. 

To the right, the graph shows the effects of the 138-kV line failures (discussed next) on the 
remaining 345-kV line, i.e., Sammis-Star’s loading increased steadily above 100 percent with 
each succeeding 138-kV line lost. 
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Figure III.6 — Line Loadings as the Northeast Ohio 345-kV Lines Trip 


Following the loss of the Chamberlin-Harding 345-kV line, contingency limit violations existed 
for: 

• The Star-Juniper 345-kV line, whose loading would exceed its emergency limit for the 
loss of the Hanna-Juniper 345-kV line; and 

• The Hanna-Juniper and Harding-Juniper 345-kV lines, whose loadings would exceed 
emergency limits for the loss of the 1,255 MW Perry Nuclear Generating Plant. 

Operationally, once the FE system entered an n-1 contingency violation state at 15:05 after the 
loss of Chamberlin-Harding, any facility loss beyond that pushed them into a more unreliable 
state. To restore the system to a reliable operating state, FE needed to reduce loading on the Star- 
Juniper, Hanna-Juniper, and Harding-Juniper lines (normally within 30 minutes) such that no 
single contingency would violate an emergency limit on one of those lines. Due to the non- 
random nature of events that afternoon (overgrown trees contacting lines), not even a 30-minute 
response time was adequate as events were beginning to speed up. The Hanna-Juniper line 
tripped and locked out at 15:32, only 27 minutes after Chamberlin-Harding. 

6. Phone Calls Indicated Worsening Conditions 

During the afternoon of August 14, FE operators talked to their field personnel, MISO, PJM, 
adjoining systems (such as AEP), and customers. The FE operators received pertinent 
information from all of these sources, but did not grasp some key information about the condition 
of the system from the clues offered. This information included a call from the FE eastern control 
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center asking about possible line trips, a call from the Perry nuclear plant regarding what looked 
like near-line trips, AEP calling about their end of the Star-South Canton line tripping, and MISO 
and PJM calling about possible line overloads. 

At 15:35, the FE control center received a call from the Mansfield 2 plant operator, concerned 
about generator fault recorder triggers and excitation voltage spikes with an alarm for over¬ 
excitation; a dispatcher called reporting a “bump” on their system. Soon after this call, the FE 
Reading, Pennsylvania, control center called reporting that fault recorders in the Erie west and 
south areas had activated, and wondering if something had happened in the Ashtabula-Perry area. 
The Perry nuclear plant operator called to report a “spike” on the unit’s main transformer. When 
he went to look at the metering it was “still bouncing around pretty good. I’ve got it relay tripped 
up here ... so I know something ain’t right.” 

It was at about this time that the FE operators began to suspect something might be wrong, but 
did not recognize that the problems were on their own system. “It’s got to be in distribution, or 
something like that, or somebody else’s problem ... but I’m not showing anything.” Unlike 
many transmission system control centers, the FE center did not have a map board, which might 
have shown the location of significant line and facility outages within the control area. 

At 15:36, MISO contacted FE regarding the post-contingency overload on Star-Juniper for the 
loss of the Hanna-Juniper 345-kV line. Unknown to MISO and FE, Hanna-Juniper had already 
tripped four minutes earlier. 

At 15:42, the FE western transmission operator informed the FE computer support staff that the 
EMS system functionality was compromised. “Nothing seems to be updating on the 
computers.... We’ve had people calling and reporting trips and nothing seems to be updating in 
the event summary... I think we’ve got something seriously sick.” This is the first evidence that 
a member of the FE control room operating staff recognized that their EMS system was degraded. 
There is no indication that he informed any of the other operators at this time. However, the FE 
computer support staff discussed the subsequent EMS corrective action with some control room 
operators shortly thereafter. 

Also at 15:42, the Perry plant operator called back with more evidence of problems. “I’m still 
getting a lot of voltage spikes and swings on the generator.... 1 don’t know how much longer 
we’re going to survive.” 

At 15:45, the tree trimming crew reported that they had witnessed a tree-caused fault on the 
Eastlake-Juniper line. However, the actual fault was on the Hanna-Juniper line in the same 
vicinity. This information added to the confusion in the FE control room because the operator 
had indication of flow on the Eastlake-Juniper line. 

After the Star-South Canton line tripped a third time and locked out at 15:42, AEP called FE at 
15:45 to discuss and inform them that they had additional lines showing overloads. FE 
recognized then that the Star breakers had tripped and remained open. 

At 15:46, the Perry plant operator called the FE control room a third time to say that the unit was 
close to tripping off: “It’s not looking good.... We ain’t going to be here much longer and you’re 
going to have a bigger problem.” 

At 15:48, an FE transmission operator sent staff to man the Star substation, and then at 15:50, 
requested staffing at the regions, beginning with Beaver, then East Springfield. This act, 43 
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minutes after the Chamberlin-Harding line trip and 18 minutes before the Sammis-Star trip, 
signaled the start of the cascade, and was the first clear indication that at least one of the FE 
system operating staff was beginning to recognize that an emergency situation existed. 

At the same time the activities above were unfolding at FE, AEP operators grew quite concerned 
about the events unfolding on their ties with FE. Beginning with the first trip of the Star-South 
Canton 345 kV line, AEP contacted FE attempting to verify the trip. Later, their state estimation 
and contingency analysis tools indicated a contingency overload for Star-South Canton 345 kV 
line and AEP requested Transmission Loading Relief action by their reliability coordinator, PJM. 
A conversation beginning at 15:35 between AEP and PJM showed considerable confusion on the 
part of the reliability coordinator. 


PJM Operator: “Where specifically are you interested 
in?” 

AEP Operator: “The South Canton-Star.” 

PJM Operator: “The South Canton-Star. Oh, you know 
what? This is interesting. I believe this one is ours.. .that 
one was actually in limbo one night, one time we needed 
it.” 

AEP Operator: “For AEP?” 

PJM Operator: “For AEP, yes. I'm thinking. South 
Canton - where'd it go? South Canton-Star, there it is. 
South Canton-Star for loss of Sammis-Star?” 

AEP Operator: “Yeah.” 

PJM Operator: “That's the one. That's currently ours. 

You need it?” 


AEP Operator: “I believe. Look what they went to.” 

PJM Operator: “Let's see. Oh, man. Sammis-Star, okay. Sammis-Star for South Canton-Star. 
South Canton-Star for Sammis-Star, (inaudible). All right, you're going to have to help me out. 
What do you need on it...?” 

AEP Operator: “Pardon?” 

PJM Operator: “What do you need? What do you need on it? How much relief you need?” 
AEP Operator: “Quite a bit.” 

PJM Operator: “Quite a bit. What's our limit?” 

AEP Operator: “I want a 3-B.” 

PJM Operator: “3-B.” 


Cause 3c: The PJM and MISO 
reliability coordinators lacked an 
effective procedure on when and how 
to coordinate an operating limit 
violation observed by one of them in 
the other’s area due to a contingency 
near their common boundary. The 
lack of such a procedure caused in¬ 
effective communications between 
PJM and MISO regarding PJM’s 
awareness of a possible overload on 
the Sammis-Star line as early as 
15:48. An effective procedure would 
have enabled PJM to more clearly 
communicate the information it had 
regarding limit violations on the FE 
system, and would have enabled 
MISO to be aware of those conditions 
and initiate corrective actions with 
FE. 
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AEP Operator: “It's good for 1,412, so I need how much cut off?” 

PJM Operator: “You need like... 230, 240.” 

PJM Operator: “Now let me ask you, there is a 345 line locked out DPL Stuart to Atlanta. Now, I 
still haven't had a chance to get up and go see where that is. Now, I don't know if that would 
have an effect.” 

AEP Operator: “1,341. I need - man, I need 300. I need 300 megawatts cut.” 

PJM Operator: “Okay. Verify our real-time flows on...” 

From this conversation it appears that the PJM reliability coordinator is not closely monitoring 
Dayton Power and Light and AEP facilities (areas for which PJM has reliability coordinator 
responsibility) in real time. Further, the operator must leave the desk to determine where a 345 
kV line is within the system, indicating a lack of familiarity with the system. 

AEP Operator: “What do you have on the Sammis-Star, do you know?” 

PJM Operator: I'm sorry? Sammis-Star, okay, I'm showing 960 on it and it's highlighted in blue. 
Tell me what that means on your machine.” 

AEP Operator: “Blue? Normal. Well, it's going to be in blue, I mean - that's what's on it?” 

PJM Operator: “960, that's what it says.” 

AEP Operator: “That circuit just tripped. South Canton-Star.” 

PJM Operator: “Did it?” 

AEP Operator: “It tripped and re-closed...” 

AEP Operator: “We need to get down there now so they can cut the top of the hour. Is there 
anything on it? What's the flowgate, do you know?” 

PJM Operator: “Yeah, I got it in front of me. It is-it is 2935.” 

AEP Operator: “Yeah.. .2935. I need 350 cut on that.” 

PJM Operator: “Whew, man.” 

AEP Operator: “Well, I don't know why. It popped up all of a sudden like that.. .that thing just 
popped up so fast.” 

PJM Operator: “And... 1,196 on South Canton. Can you verify these? And 960 on - South 
Canton-Star 1,196, Sammis-Star 960?” 

AEP Operator: “They might be right. I'm...” 

PJM Operator: “They were highlighted in blue, I guess I thought maybe that was supposed to be 
telling me something.” 
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This conversation demonstrates that the PJM operator is not fully familiar with the monitoring 
system being used. The operator is questioning the AEP operator about what something in blue 
on the screen represents since presumably the AEP operator is more familiar with the system the 
PJM operator is using. 

The AEP operators are witnessing portions of the 138 kV cascade and relaying that information 
to the PJM operator. The PJM operator, as seen below, is looking at a state estimator screen and 
not real time flows or status of the AEP system and hence is unaware of these line trips from his 
monitoring system. 

PJM Operator: . .Sammis-Star, I'm still seeing flow on both those lines. Am I'm looking at state 
estimated data?” 

AEP Operator: “Probably.” 

PJM Operator: “Yeah, it's behind, okay. You're able to see raw data?” 

AEP Operator: “Yeah; it's open. South Canton-Star is open.” 

PJM Operator: “South Canton-Star is open. Torrey-Cloverdale?” 

AEP Operator: “Oh, my God, look at all these open...” 

AEP: “We have more trouble... more things are tripping. East Lima and New Liberty tripped 
out. Look at that.” 

AEP: “Oh, my gosh, I'm in deep...” 

PJM Operator: “You and me both, brother. What are we going to do? You need something, you 
just let me know.” 

AEP Operator: “Now something else just opened up. A lot of things are happening.” 

PJM Operator: “Okay... South Canton-Star. Okay, I'm seeing a no-flow on that. So what are we 
overloading now? We lost South Canton-Star, we're going to overload Sammis-Star, right? The 
contingency is going to overload, which is the Sammis-Star. The FE line is going to overload as 
a result of that. So I should probably talk to MISO.” 

AEP Operator: “Pardon?” 

PJM Operator: “I should probably talk to MISO because they're going to have to talk to FE.” 

As the AEP operators continue to witness the evolving cascade of the FE 138 kV system, the 
conversation ended at this point and PJM called MISO at 15:55. PJM reported the Star-South 
Canton trip to MISO, but their measures of the resulting line flows on the FE Sammis-Star line 
did not match, causing them to wonder whether the Star-South Canton line had returned to 
service. From the MISO operator phone transcripts: 

PJM Operator: “.. .AEP, it looks like they lost South Canton-Star 345 line, and we are showing a 
contingency for that line and the Sammis-Star line, and one of them lost the other. Since they lost 
that line, I was wondering if you could verify flows on the Sammis-Star line for me at this time.” 
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MISO Operator: “Well, let's see what I’ve got. I know that First Energy lost their Juniper line, 
too.” 

PJM Operator: “Did they?” 

MISO Operator: “They are still investigating that, too. So the Star-Juniper line was overloaded.” 
PJM Operator: “Star-Juniper.” 

MISO Operator: “And they recently have got that under control here.” 

PJM Operator: “And when did that trip? That might have...” 

MISO Operator: “I don't know yet. I still have -1 have not had that chance to investigate it. 

There is too much going on right now.” 

PJM Operator: “Yeah, we are trying to figure out what made that one jump up on us so quick.” 
MISO Operator: “It may be a combination of both. You guys lost South Canton to Star.” 

PJM Operator: “Yes.” 

MISO Operator: “And we lost Hanna to Juniper it looks like.” 

PJM Operator: “Yes. And we were showing an overload for Sammis to Star for the South Canton 
to Star. So I was concerned, and right now I am seeing AEP systems saying Sammis to Star is at 
1378.” 

MISO Operator: “All right. Let me see. I have got to try and find it here, if it is possible and I 
can go from here to Juniper Star. How about 1109?” 

PJM Operator: “1,109?” 

MISO Operator: “I see South Canton Star is open, but now we are getting data of 1199, and I am 
wondering if it just came after.” 

PJM Operator: “Maybe it did. It was in and out, and it had gone out and back in a couple of 
times.” 

MISO Operator: “Well, yeah, it would be no good losing things all over the place here.” 

PJM Operator: “All right. I just wanted to verify that with you, and I will let you tend to your 
stuff.” 

MISO Operator: “Okay.” 

PJM Operator: “Tha nk you, sir. Bye.” 

Considering the number of facilities lost, and that each reliability coordinator is discovering new 
lines are out that he did not previously know, there is an eerie lack of urgency or any discussion 
of actions to be taken. The MISO operator provided some additional information about 
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transmission line outages in FE, even though they did not have direct monitoring capabilities of 
their facilities on August 14. The PJM operator indicated that the South Canton-Star line was out 
of service, but did not relay any of the information regarding the other lines that were reported as 
tripping by the AEP operator. The MISO operator did not act on this information and PJM 
operator did not press the issue. 

As shown by the investigation, by 15:55 at the start of this PJM-MISO call, the overload on 
Sammis-Star line exceeded 110% and continued to worsen. The overload began at 15:42 after 
the Star - S. Canton 345kV line locked open. At 16:05:57 just prior to tripping, fault recorders 
show a Sammis-Star flow of 2,850 amperes or 130% of its emergency 2193 ampere rating. 

At 15:56, PJM was still concerned about the impact of the Star-South Canton trip, and PJM 
called FE to report that Star-South Canton had tripped and that PJM thought Sammis-Star was in 
actual emergency limit overload. FE could not confirm this overload. Investigators later 
discovered that FE was using a higher rating for the Sammis-Star line than was being used by 
MISO, AEP, and PJM — indicating ineffective coordination of FE line ratings with others. 4 FE 
informed PJM that Hanna-Juniper was also out of service. At this time, FE operators still 
believed that the problems existed beyond their system, one of them saying, “AEP must have lost 
some major stuff.” 

Modeling indicates that the return of either the Hanna-Juniper or Chamberlin-Harding lines 
would have diminished, but not alleviated, all of the 138-kV overloads. The return of both lines 
would have restored all of the 138 lines to within their emergency ratings. However, all three 
345-kV lines had already been compromised due to tree contacts, so it is unlikely that FE would 
have successfully restored either line had they known it had tripped out. Also, since Star-South 
Canton had already tripped and reclosed three times, it is unlikely that an operator knowing this 
would have trusted it to operate securely under emergency conditions. While generation 
redispatch scenarios alone would not have solved the overload problem, modeling indicates that 
shedding load in the Cleveland and Akron areas could have reduced most line loadings to within 
emergency range and helped to stabilize the system. However, the amount of load shedding 
required grew rapidly as the FE system unraveled. 

F. Localized Cascade of the 138-kV System in Northeastern 
Ohio: 15:39 to 16:08 EDT 

1. Summary 

At 15:39, a series of 138-kV line trips occurred in the vicinity of Akron because the loss of the 
Chamberlin-Harding, Hanna-Juniper, and Star-South Canton 345-kV lines overloaded the 138- 
kV system with electricity flowing north toward the Akron and Cleveland loads. Voltages in the 
Akron area also began to decrease and eventually fell below low limits. 

One of the two Pleasant Valley-West Akron lines was the first 138-kV line to trip at 15:39:37, 
indicating the start of a cascade of 138-kV line outages in that area. A total of seven 138-kV 


4 Specifically, FE was operating Sammis-Star assuming that the 345-kV line was rated for summer normal 
use at 1,310 MV A, with a summer emergency limit rating of 1,310 MVA. In contrast, MISO, PJM, and 
AEP were using a more conservative 950 MVA normal rating and 1,076 MVA emergency rating for this 
line. The facility owner (in this case FE) develops the line rating. It has not been determined when FE 
changed the ratings it was using; they did not communicate the changes to all concerned parties. 
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lines tripped during the next 20 minutes, followed at 15:59 by a stuck-breaker operation that 
cleared the 138-kV bus at West Akr on and instantaneously opened five more 138-kV lines. Four 
additional 138-kV lines eventually opened over a three-minute period from 16:06 to 16:09, after 
the Sammis-Star 345-kV line opened to signal the transition from a localized failure to a 
spreading wide-area cascade. 

During this same period at 15:45:41, the Canton Central-Tidd 345-kV line tripped and then 
reclosed at 15:46:29. The Canton Central 345/138-kV Circuit Breaker A1 operated multiple 
times, causing a low air pressure problem that inhibited circuit breaker tripping. This event 
forced the Canton Central 345/138-kV transformer to disconnect and remain out of service, 
further weakening the Canton-Akron area 138-kV transmission system. 

Approximately 600 MW of customer loads were shut down in Akron and areas to the west and 
south of the city during the cascade because they were being served by transformers connected to 
those lines. As the lines failed, severe voltage drops caused a number of large industrial 
customers with voltage-sensitive equipment to go off-line automatically to protect their 
operations. 



Figure III.7 — Akron Area Substations Participating in Localized 138-kV Cascade 
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2. 138-kV Localized Cascade Sequence of Events 

From 15:39 to 15:58:47, seven 138-kV lines in northern Ohio tripped and locked out. 


Table 111.1 — 138-kV Line Trips Near Akron: 15:39 to 15:58:47 


15:39:17 

Pleasant Valley-West Akron 138-kV line tripped and reclosed at both ends. 

15:42:05 

Pleasant Valley-West Akron 138-kV West line tripped and reclosed. 

15:44:40 

Pleasant Valley-West Akron 138-kV West line tripped and locked out. B Phase had 
sagged into the underlying distribution conductors. 

15:42:49 

Canton Central-Cloverdale 138-kV line tripped and reclosed. 

15:45:40 

Canton Central-Cloverdale 138-kV line tripped and locked out. Phase-to-ground pilot 
relay targets were reported at both ends of the line. DFR analysis identified the fault to 
be 7.93 miles from Canton Central. The Canton Central 138-kV bus is a ring bus. A 
138-kV circuit breaker failed to clear the 138-kV line fault at Canton Central. This 
breaker is common to the 138-kV autotransformer bus and the Canton Central- 
Cloverdale 138-kV line. 

15:42:53 

Cloverdale-Torrey 138-kV line tripped. 

15:44:12 

East Lima-New Liberty 138-kV line tripped. B Phase sagged into the underbuild. 

15:44:32 

Babb-West Akron 138-kV line tripped and locked out. 

15:51:41 

East Lima-North Findlay 138-kV line tripped and reclosed at the East Lima end only. 

At the same time, the Fostoria Central-North Findlay 138-kV line tripped and reclosed, 
but never locked out. 

15:58:47 

Chamberlin-West Akron 138-kV line tripped. Relays indicate a probable trip on 
overload. 


With the Canton Central-Cloverdale 138-kV line trip at 15:45:40, the Canton Central 345-kV and 
138-kV circuit breakers opened automatically to clear the fault via breaker failure relaying. 
Transfer trip initiated circuit breaker tripping at the Tidd substation end of the Canton Central- 
Tidd 345-kV line. A 345-kV disconnect opened automatically disconnecting two 
autotransformers from the Canton Central-Tidd 345-kV line after the fault was interrupted. The 
138-kV circuit breaker’s breaker-failure relay operated as designed to clear the fault. After the 
345-kV disconnect opened and the 345-kV circuit breakers automatically reclosed. Canton 
Central-Tidd 345kV was restored at 15:46:29. 


Table III.2 — West Akron Stuck Breaker Failure 


15:59:00 

West Akron-Aetna 138-kV line opened. 

15:59:00 

Barberton 138-kV line opened at West Akron end only. West Akron-B18 138-kV 
tie breaker opened, affecting West Akron 138/12-kV transformers #3, 4, and 5 fed 
from Barberton. 

15:59:00 

West Akron-Granger-Stoney-Brunswick-West Medina opened. 

15:59:00 

West Akron-Pleasant Valley 138-kV East line (Q-22) opened. 

15:59:00 

West Akron-Rosemont-Pine-Wadsworth 138-kV line opened. 


The West Akron substation 138-kV bus was cleared at 15:59:00 due to a circuit breaker failure. 
The circuit breaker supplied a 138/69-kV transformer. The transformer phase directional power 
relay operated to initiate the trip of the breaker and its subsequent breaker failure backup 
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protection. There was no system fault at the time. The phase directional power relay operated 
because the 69-kV system in the Ak ron area became a supply to the 138-kV system. This 
reversal of power was due to the number of 138-kV lines that had tripped due to overloads and 
line faults caused by line overloads. 

Investigators believe that the 138-kV circuit breaker failed because it was slow to operate. At 
15:59:00, the West Akron 138-kV bus cleared from a failure to trip relay on the 138-kV circuit 
breaker B26, which supplies the 138/69-kV transformer number 1. The breaker trip was initiated 
by a phase directional overcurrent relay in the B26 relay circuit looking directionally into the 
138-kV system from the 69-kV system. The West Akron 138/12-kV transformers remained 
connected to the Barberton-West Akron 138-kV line, but power flow to West Akron 138/69-kV 
transformer number 1 was interrupted. Output of the failure to trip (breaker failure) timer 
initiated a trip of all five remaining 138-kV lines connected at West Akron. Investigators believe 
that the relay may have operated due to high reactive power flow into the 138-kV system. This is 
possible even though power was flowing into the 69-kV system at the time. 

From 16:00 to 16:08:59, four additional 138-kV lines tripped and locked out, some before and 
some after the Sammis-Star 345-kV line trip. After the Cloverdale-Torrey line failed at 15:42, 
Dale-West Canton was the most heavily loaded line on the FE system. It held on, although 
overloaded to between 160 and 180 percent of its normal rating, until tripping at 16:05:55. The 
loss of the Dale-West Canton 138-kV line had a significant effect on the area, and voltages 
dropped significantly after the loss of this line. 

Even more importantly, loss of the Dale-West Canton line shifted power from the 138-kV system 
back to the remaining 345-kV network, pushing Sammis-Star’s loading above 120 percent of its 
rating. This rating is a substation equipment rating rather than a transmission line thermal rating, 
therefore sag was not an issue. Two seconds later, at 16:05:57, Sammis-Star tripped and locked 
out. Unlike the previous three 345-kV lines, which tripped on short circuits due to tree contacts, 
Sammis-Star tripped because its protective relays saw low apparent impedance (depressed 
voltage divided by abnormally high line current), i.e., the relay reacted as if the high flow was 
due to a short circuit. Although three more 138-kV lines dropped quickly in Ohio following the 
Sammis-Star trip, loss of the Sammis-Star line marked the turning point at which problems in 
northeast Ohio initiated a cascading blackout across the Northeast. 


Table 111.3 — Additional 138-kV Line Trips Near Akron 


16:05:55 

Dale-West Canton 138- kV line tripped at both ends, reclosed at West Canton only. 

16:05:57 

Sammis-Star 345-kV line tripped. 

16:06:02 

Star-Urban 138-kV line tripped (reclosing is not initiated for backup trips). 

16:06:09 

Richland-Ridgeville-Napoleon-Stryker 138-kV line tripped and locked out at all 
terminals. 

16:08:58 

Ohio Central-Wooster 138-kV line tripped. 

16:08:55 

East Wooster-South Canton 138-kV line tripped, but successful automatic reclosing 
restored this line. 


3. Sammis-Star 345-kV Line Trip: Pivot Point 

Sammis-Star did not trip due to a short circuit to ground (as did the prior 345-kV lines that 
tripped). Sammis-Star tripped due to protective relay action that measured low apparent 
impedance (depressed voltage divided by abnormally high line current) (Figure III. 10). There 
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was no fault and no major power swing at the time of the trip — rather, high flows above the 
line’s emergency rating, together with depressed voltages, caused the overload to appear to the 
protective relays as a remote fault on the system. In effect, the relay could no longer differentiate 
between a remote three-phase fault and a high line-load condition. Moreover, the reactive flows 
(Var) on the line were almost ten times higher than they had been earlier in the day. The steady 
state loading on the line had increased gradually to the point where the operating point entered the 
zone 3 relay trip circle. The relay operated as it was designed to do. By design, reclosing is not 
initiated for trips initiated by backup relays. 

As shown in Figure III.8, the Sammis-Star line trip completely severed the 345-kV path into 
northern Ohio from southeast Ohio, triggering a new, fast-paced sequence of 345-kV 
transmission line trips in which each line trip placed a greater flow burden on those lines 
remaining in service. After Sammis-Star tripped, there were only three paths left for power to 
flow into northern Ohio: (1) from northwestern Pennsylvania to northern Ohio around the south 
shore of Lake Erie, (2) from southern Ohio, and (3) from eastern Michigan and Ontario. 
Northeastern Ohio had been substantially weakened as a source of power to eastern Michigan, 
making the Detroit area more reliant on 345-kV lines west and northwest of Detroit, and from 
northwestern Ohio to eastern Michigan. 



Figure III.8 — Cleveland-Akron Cut Off 


After the Sammis-Star line trip, the conditions were set for an uncontrolled cascade of line 
failures that would separate the northeastern United States and eastern Canada from the rest of 
the Eastern Interconnection, then a breakup and collapse of much of that newly formed island. 
An important distinction is drawn here — that no events, actions, or failures to take action after 
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the Sammis-Star trip can be deemed to have caused the blackout. Later sections will address 
other factors that affected the extent and severity of the blackout. 

The Sammis-Star line tripped at Sammis Generating Station due to a zone 3 impedance relay. 
There were no system faults occurring at the time. The relay tripped because increased real and 
reactive power flow caused the apparent impedance to be within the impedance circle (reach) of 
the relay. Several 138-kV line outages just prior to the tripping of Sammis-Star contributed to 
the tripping of this line. Low voltages and the increased reactive power flow into the line from 
Sammis Generating Station contributed to the operation of the relay. Prior to the loss of 
Sammis-Star, operator action to shed load may have been an appropriate action. Subsequent to 
the Sammis-Star line trip, only automatic protection systems would have mitigated the cascade. 

A zone 3 relay can be defined as an impedance relay that is set to detect system faults on the 
protected transmission line and beyond. 5 It sometimes serves a dual purpose. It can act through 
a timer to see faults beyond the next bus up to and including the furthest remote element attached 
to the bus. It is used for equipment protection beyond the line and it is an alternative to 
equipment failure communication systems sometimes referred to as breaker failure transfer trip. 
Zone 3 relays can also be used in the high-speed relaying system for the line. In this application, 
the relay needs directional intelligence from the other end of the line that it receives via a highly 
reliable communication system. In the Sammis-Star trip, the zone 3 relay operated because it 
was set to detect a remote fault on the 138-kV side of a Star substation transformer in the event 
of a breaker failure. 



R E(ZB C^). R E(ZI,) 

Resistance in Ohms 


Figure III.9 — Load Encroachment of Sammis-Star Zone 3 Impedance Relay 


5 Zone 3 in this context means all forward and overreaching distance relays, which could 
also include zone 2 distance relays. 
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IV. Cascading Failure of the Power System 

Section III described how uncorrected problems in northern Ohio developed to 16:05:57, the last point at 
which a cascade of line trips could have been averted. The investigation also sought to understand how 
and why the cascade spread and stopped as it did. This section details the sequence of events in the 
cascade, how and why it spread, and how it stopped in each geographic area. 

The cascade spread beyond Ohio and caused a widespread blackout for three principal reasons. First, the 
loss of the Sammis-Star line in Ohio, following the loss of other transmission lines and weak voltages 
within Ohio, triggered many subsequent line trips. Second, many of the key lines that tripped between 
16:05:57 and 16:10:38 operated on zone 3 impedance relays (or zone 2 relays set to operate like zone 3s), 
which responded to overloads rather than faults on the protected facilities. The speed at which they 
tripped accelerated the spread of the cascade beyond the Cleveland-Akron area. Third, the evidence 
indicates that the relay protection settings for the transmission lines, generators, and underfrequency load¬ 
shedding in the Northeast may not be sufficient to reduce the likelihood and consequences of a cascade, 
nor were they intended to do so. These issues are discussed in depth below. 

This analysis is based on close examination of the events in the cascade, supplemented by dynamic 
simulations of the electrical phenomena that occurred. At the completion of this report, the modeling had 
progressed through 16:11:00, and was continuing. Thus, this section is informed and validated by 
modeling up until that time. Explanations after that time reflect the investigation team’s best hypotheses 
given the available data, and may be confirmed or modified when the modeling is complete. Final 
modeling results will be published as a technical report at a later date. 

A. How the Cascade Evolved 

A series of line outages in northeastern Ohio starting at 15:05 caused heavy loadings on parallel circuits, 
leading to the trip and lock-out of the Sammis-Star line at 16:05:57. This was the event that triggered a 
cascade of line outages on the high voltage system, causing electrical fluctuations and generator trips such 
that within seven minutes the blackout rippled from the Cleveland-Akron area across much of the 
northeastern United States and Canada. By 16:13, more than 508 generating units at 265 power plants 
had been lost, and tens of millions of people in the United States and Canada were without electric power. 

The events in the cascade started slowly, but spread quickly. Figure IV. 1 illustrates how the number of 
lines and generators lost stayed relatively low during the Ohio phase of the blackout, but then picked up 
speed after 16:08:59. The cascade was complete only two-and-one-half minutes later. 
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Figure IV.1 —Accumulated Line and Generator Trips During the Cascade 


The collapse of the FE transmission system induced unplanned shifts of power across the region. Shortly 
before the collapse, large (but normal) electricity flows were moving through the FE system from 
generators in the south (Tennessee and Kentucky) and west (Illinois and Missouri) to load centers in 
northern Ohio, eastern Michigan, and Ontario. Once the 345-kV and 138-kV system outages occurred in 
the Cleveland-Akron area, power that was flowing into that area over those lines shifted onto lines to the 
west and the east. The rapid increase in loading caused a series of lines within northern Ohio to trip on 
zone 3 impedance relays. A “ripping” effect occurred as the transmission outages propagated west across 
Ohio into Michigan. The initial propagation of the cascade can best be described as a series of line trips 
caused by sudden, steady state power shifts that overloaded other lines — a “domino” effect. 

The line trips progressed westward across Ohio, then northward into Michigan, separating western and 
eastern Michigan, causing a 500 MW power reversal within Michigan toward Cleveland. Many of these 
line trips were caused by zone 3 impedance relay actions that accelerated the speed of the line trips. With 
paths cut from the west, a massive power surge flowed from PJM into New York and Ontario in a 
counter-clockwise flow around Lake Erie to serve the load still connected in eastern Michigan and 
northern Ohio. Transient instability began after 16:10:38, and large power swings occurred. First, a 
power surge of 3,700 MW flowed into Michigan across the Canadian border. Then the flow reversed by 
5,800 MW within one second and peaked at 2,100 MW from Michigan to Canada. Relays on the lines 
between PJM and New York saw massive power swings and tripped those lines. Ontario’s east-west tie 
line also tripped, leaving northwestern Ontario connected to Manitoba and Minnesota. 

The entire northeastern United States and eastern Ontario then became a large electrical island separated 
from the rest of the Eastern Interconnection. The major transmission split initially occurred along the 
long transmission lines across the Pennsylvania border to New York, and then proceeded into 
northeastern New Jersey. The resulting large electrical island, which had been importing power prior to 
the cascade, quickly became unstable after the massive transient swings and system separation. There 
was not sufficient generation on-line within the island to meet electricity demand. Systems to the south 
and west of the split, such as PJM, AEP, and others further away, remained intact and were mostly 
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unaffected by the outage. Once the Northeast split from the rest of the Eastern Interconnection, the 
cascade was isolated to that portion of the Interconnection. 

In the final phase, after 16:10:46, the large electrical island in the northeast had less generation than load, 
and was unstable with large power surges and swings in frequency and voltage. As a result, many lines 
and generators across the disturbance area tripped, breaking the area into several electrical islands. 
Generation and load within most of the smaller islands were unbalanced, leading to further tripping of 
lines and generating units until equilibrium was established in each island. Although much of the 
disturbance area was fully blacked out in this process, some islands were able to reach equilibrium 
between generation and load without a total loss of service. For example, the island consisting of most of 
New England and the Maritime provinces stabilized, and generation and load returned to balance. 

Another island consisted of load in western New York and a small portion of Ontario, supported by some 
New York generation, the large Beck and Saunders plants in Ontario, and the 765-kV interconnection to 
Quebec. These two large islands survived but other areas with large load centers collapsed into a 
blackout condition (Figure IV.2). 



Figure IV.2 — Area Affected by the Blackout 


B. Transmission System Cascade in Northern Ohio and South-Central 
Michigan 

1. Overview 

After the loss of Sammis-Star and the underlying 138-kV system, there were no large capacity 
transmission lines left from the south to support the significant amount of load in northern Ohio (Figure 
IV.3). This overloaded the transmission paths west and northwest into Michigan, causing a sequential 
loss of lines and power plants. 
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The key events in this phase of the cascade were: 

• 16:05:57: Sammis-Star 345-kV line tripped by zone 3 relay 

• 16:08:59: Galion-Ohio Central-Muskingum 345-kV line tripped 

• 16:09:06: East Lima-Fostoria Central 345-kV line tripped on zone 3 relay, causing a ripple of 
power swings through New York and Ontario into Michigan 

• 16:09:08 to 16:10:27: Several power plants were lost, totaling 937 MW 

C. Sammis-Star 345-kV Trip: 16:05:57 EDT 

Sammis-Star did not trip due to a short circuit to ground (as did the prior 345-kV lines that tripped). 
Sammis-Star tripped due to protective zone 3 relay action that measured low apparent impedance (Figure 
III.9). There was no fault and no major power swing at the time of the trip; rather, high flows at 130 
percent of the line’s emergency rating, together with depressed voltages, caused the overload to appear to 
the protective relays as a remote fault on the system. In effect, the relay could no longer differentiate 
between a remote three-phase fault and conditions of high loading and low voltage. Moreover, the 
reactive flows (Vars) on the line were almost ten times higher than they had been earlier in the day 
because of the degrading conditions in the Cleveland-Akron area. The relay operated as designed. 

The Sammis-Star trip completely severed the 345-kV path into northern Ohio from southeastern Ohio, 
triggering a new, fast-paced sequence of 345-kV transmission line trips in which each line trip placed a 
greater flow burden on those lines remaining in service. These line outages left only three paths for 
power to flow into western Ohio: (1) from northwestern Pennsylvania to northern Ohio around the south 
shore of Fake Erie, (2) from southwestern Ohio toward northeastern Ohio, and (3) from eastern Michigan 
and Ontario. The line interruptions substantially weakened northeast Ohio as a source of power to eastern 
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Michigan, making the Detroit area more reliant on 345-kV lines west and northwest of Detroit, and from 
northwestern Ohio to eastern Michigan. 



Soon after the Sammis-Star trip, four of the five 48 MW Handsome Lake combustion turbines in western 
Pennsylvania tripped off-line. These units are connected to the 345-kV system by the Homer City-Wayne 
345-kV line, and were operating that day as synchronous condensers to participate in PJM’s spinning 
reserve market (not to provide voltage support). When Sammis-Star tripped and increased loadings on 
the local transmission system, the Handsome Lake units were close enough electrically to sense the 
impact and tripped off-line at 16:07:00 on under-voltage relay protection. 
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During the period between the Sammis-Star trip and the trip of East Lima-Fostoria 345-kV line at 
16:09:06.3, the system was still in a steady-state condition. Although one line after another was 
overloading and tripping within Ohio, this was happening slowly enough under relatively stable 
conditions that the system could readjust; after each line loss, power flows would redistribute across the 
remaining lines. This is illustrated in Figure IV.6, which shows the megawatt flows on the MECS 
interfaces with AEP (Ohio), FE (Ohio), and Ontario. The graph shows a shift from 150 MW of imports 
to 200 MW of exports from the MECS system into FE at 16:05:57 after the loss of Sammis-Star, after 
which this held steady until 16:08:59, when the loss of East Lima-Fostoria Central cut the main energy 
path from the south and west into Cleveland and Toledo. Loss of this path was significant, causing flow 
from MECS into FE to jump from 200 MW up to 2,300 MW, where it swung dynamically before 
stabilizing. 



Time-EOT 

Figure IV.6 — Line Flows Into Michigan 

1. Line Trips Westward across Ohio and Generator Trips in Michigan and Ohio: 
16:08:59 to 16:10:27 EDT 

Key events in this portion of the cascade are: 

• 16:08:59: Galion-Ohio Central-Muskingum 345-kV line tripped 

• 16:09:06: East Lima-Fostoria Central 345-kV line tripped, causing a large power swing from 
Pennsylvania and New York through Ontario to Michigan 

The Muskingum-Ohio Central-Galion line tripped first at Muskingum at 16:08:58.5 on a phase-to-ground 
fault. The line reclosed and tripped again at 16:08:58.6 at Ohio Central. The line reclosed a second time 
and tripped again at Muskingum on a zone 3 relay. Finally, the line tripped and locked open at Gabon on 
a low magnitude B Phase ground fault. After the Muskingum-Ohio Central-Galion line outage and 
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numerous 138-kV line trips in central Ohio, the East Lima-Fostoria Central line tripped at 16:09:06 on 
zone 3 relay operation due to high current and low voltage (80 percent). Modeling indicates that if 
automatic under-voltage load-shedding had been in place in northeastern and central Ohio, it might have 
been triggered at or before this point and dropped enough load to reduce or eliminate the subsequent line 
overloads that spread the cascade. The line trips across Ohio are shown in figure IV.7. 



The tripping of the Galion-Ohio Central-Muskingum and East Lima-Fostoria Central transmission lines 
removed the transmission paths from southern and western Ohio into northern Ohio and eastern 
Michigan. Northern Ohio was connected to eastern Michigan by only three 345-kV transmission lines 
near the southwestern bend of Lake Erie. Thus, the combined northern Ohio and eastern Michigan load 
centers were left connected to the rest of the grid only by: (1) transmission lines eastward from 
northeastern Ohio to northwestern Pennsylvania along the southern shore of Lake Erie, and (2) westward 
by lines west and northwest of Detroit, Michigan, and from Michigan into Ontario (Figure IV.8). 



Figure IV.8 — Power Flows at 16:09:25 
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Although the blackout of August 14 has been labeled by some as a voltage collapse, it was not a voltage 
collapse as that term has been traditionally used by power system engineers. Voltage collapse occurs 
when an increase in load or loss of generation or transmission facilities causes voltage to drop, which 
causes a further reduction in reactive power from capacitors and line charging, and still further voltage 
reductions. If the declines continue, these voltage reductions cause additional elements to trip, leading to 
further reduction in voltage and loss of load. The result is a progressive and uncontrollable decline in 
voltage because the power system is unable to provide the reactive power required to supply the reactive 
power demand. This did not occur on August 14. While the Cleveland-Akron area was short of reactive 
power reserves, there was sufficient reactive supply to meet the reactive power demand in the area and 
maintain stable albeit depressed 345-kV voltage for the outage conditions experienced from 13:31 to 
15:32. This included the first forced outage at 13:31 (Eastlake 5 trip) to the third contingency at 15:32 
(Hanna-Juniper trip). Only after the fourth contingency, the lockout of South Canton-Star at 15:42, did 
the 345-kV voltage drop below 90 percent at the Star substation. 

As the cascade progressed beyond Ohio, it did not spread due to insufficient reactive power and a voltage 
collapse, but because of large line currents with depressed voltages, dynamic power swings when the East 
Lima-Fostoria Central line trip separated southern Ohio from northern Ohio, and the resulting transient 
instability after northern Ohio and eastern Michigan were isolated onto the Canadian system. 

Figure IV.9 shows voltage levels recorded in the Niagara area. It shows that voltage levels remained 
stable until about 16:10:30, despite significant power fluctuations. In the cascade that followed, the 
voltage instability was a companion to, not a driver of, the angular instability that tripped generators and 
lines. A high-speed recording of 345-kV flows at Niagara Falls taken by the Hydro One recorders (shown 
as the lower plot in Figure IV.9), shows the impact of the East Fima-Fostoria Central and the New York- 
to-Ontario power swing, which continued to oscillate for more than ten seconds. Fooking at the MW 
flow plot, it is clear that when Sammis-Star tripped, the system experienced oscillations that quickly 
damped out and rebalanced. But East Fima-Fostoria triggered significantly greater oscillations that 
worsened in magnitude for several cycles, and then dampened more slowly but continued to oscillate until 
the Argenta-Battle Creek trip 90 seconds later. Voltages also began to decline at that time. 



Figure IV.9 — New York-Ontario Line Flows at Niagara 
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After the East Lima-Fostoria Central trip, power flows increased dramatically and quickly on the lines 
into and across southern Michigan. Although power had initially been flowing northeast out of Michigan 
into Ontario, that flow suddenly reversed and approximately 500 to 700 MW of power flowed southwest 
out of Ontario through Michigan to serve the load of Cleveland and Toledo. This flow was fed by 700 
MW pulled out of PJM through New York on its 345-kV network. This was the first of several inter-area 
power and frequency events that occurred over the next two minutes. This was the system’s response to 
the loss of the northwestern Ohio transmission paths, and the stress that the Cleveland, Toledo, and 
Detroit loads put onto the surviving lines and local generators. 

The far right side of Figure IV.9 shows the fluctuations in flows and voltages at the New York-Ontario 
Niagara border triggered by the trips of the Argenta-Battle Creek, Argenta-Tompkins, Hampton-Pontiac, 
and Thetford-Jewell 345-kV lines in Michigan, and the Erie West-Ashtabula-Perry 345-kV line linking 
the Cleveland area to Pennsylvania. Farther south, the very low voltages on the northern Ohio 
transmission system made it difficult for the generation in the Cleveland and Fake Erie area to remain 
synchronous with southeast Michigan. Over the next two minutes, generators in this area shut down after 
reaching a point of no recovery as the stress level across the remaining ties became excessive. 

Figure IV. 10, showing metered power flows along the New York interfaces, documents how the flows 
heading north and west toward Detroit and Cleveland varied at three different New York interfaces. 
Beginning at 16:09:05, power flows jumped simultaneously across all three interfaces; but when the first 
power surge peaked at 16:09:09, the change in flow was highest on the PJM interface and lowest on the 
New England interface. Power flows increased significantly on the PJM-New York and New York- 
Ontario interfaces because of the redistribution of flow around Fake Erie to serve the loads in northern 
Ohio and eastern Michigan. The New England and Maritimes systems maintained the same generation- 
to-load balance and did not carry the redistributed flows because they were not in the direct path of the 
flows. Therefore, the New England-New York interface flows showed little response. 



Time (EDT) 

Figure IV.10 — First Power Swing has Varying Impacts on New York Interfaces 
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Before this first major power swing on the Michigan/Ontario interface, power flows in the NPCC Region 
(Quebec, Ontario and the Maritimes, New England, and New York) were typical for the summer period 
and well within acceptable limits. Up until this time, transmission and generation facilities were in a 
secure state across the NPCC region. 

2. Loss of Generation Totaling 946 MW: 16:09:08 to 16:10:27 EDT 

The following generation was lost from 16:09:08 to 16:10:27 (Figure IV. 11): 

• 16:09:08: Michigan Cogeneration Venture (MCV) plant run back of 300 MW (from 1,263 MW 
to 963 MW) 

• 16:09:15: Avon Lake 7 unit tripped (82 MW) 

• 16:09:17: Burger 3, 4, and 5 units tripped (355 MW total) 

• 16:09:23 to 30: Kinder Morgan units 3, 6, and 7 tripped (209 MW total) 

The MCV plant in central Michigan experienced a 300 MW run-back. The Avon Lake 7 unit tripped due 
to the loss of the voltage regulator. The Burger units tripped after the 138-kV lines from the Burger 138- 
kV generating substation bus to substations in Ohio tripped from high reactive power flow due to the low 
voltages in the Cleveland area. Three units at the Kinder Morgan generating station in south-central 
Michigan tripped due to a transformer fault and over-excitation. 



Figure IV.11 — Michigan and Ohio Power Plants Trip or Run Back 
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Power flows into Michigan from Indiana increased to serve loads in eastern Michigan and northern Ohio 
(still connected to the grid through northwest Ohio and Michigan) and voltages dropped from the 
imbalance between high loads and limited transmission and generation capability. 

D. High Speed Cascade 

Between 16:10:36 and 16:13, a period of less than a minute and a half, a chain reaction of thousands of 
events occurred on the grid, driven by physics and automatic equipment operations. When it was over, 
much of the Northeast was in the dark. 

1. Transmission and Generation Trips in Michigan: 16:10:36 to 16:10:37 EDT 

The following key events occurred as the cascade propagated from Ohio and sliced through Michigan: 

• 16:10:36.2: Argenta-Battle Creek 345-kV line tripped 

• 16:10:36.3: Argenta-Tompkins 345-kV line tripped 

• 16:10:36.8: Battle Creek-Oneida 345-kV line tripped 

• 16:10:37: Sumpter Units 1, 2, 3, and 4 units tripped on under-voltage (300 MW near Detroit) 

• 16:10:37.5: MCV Plant output dropped from 944 MW to 109 MW on over-current protection 

Together, the above line outages interrupted the west-to-east transmission paths into the Detroit area from 
south-central Michigan. The Sumpter generating units tripped in response to under-voltage on the 
system. Michigan lines west of Detroit then began to trip, as shown in Figure IV. 12. 



Figure IV.12 — Transmission and Generation Trips in Eastern Michigan, 16:10:36 to 

16:10:37 

The Argenta-Battle Creek relay first opened the line at 16:10:36.230. The line reclosed automatically at 
16:10:37, then tripped again. This line connects major generators — including the Cook and Palisades 
nuclear plants and the Campbell fossil plant — to the Eastern MECS system. This line is designed with 
auto-reclose breakers at each end of the line, which do an automatic high-speed reclose as soon as they 
open to restore the line to service with no interruptions. Since the majority of faults on the North 
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American grid are temporary, automatic reclosing can enhance stability and system reliability. However, 
situations can occur where the power systems behind the two ends of the line could go out of phase 
during the high-speed reclose period (typically less than 30 cycles, or one half second, to allow the air to 
de-ionize after the trip to prevent arc re-ignition). To address this and protect generators from the harm 
that an out-of-synchronism reconnect could cause, it is worth studying whether a synchro-check relay is 
needed to reclose only when the two ends of a line are within a certain voltage and phase angle tolerance. 
No such protection was installed at Argenta-Battle Creek. When the line reclosed, there was a 70 degree 
difference in phase across the circuit breaker reclosing the line. There is no evidence that the reclose 
harmed the local generators. Power flows following the trip of the central Michigan lines are shown in 
Figure IV. 13. 



2. Western and Eastern Michigan Separate: 16:10:37 to 16:10:38 EDT 

The following key events occurred at 16:10:37-38: 

• 16:10:38.2: Hampton-Pontiac 345-kV line tripped 

• 16:10:38.4: Thetford-Jewell 345-kV line tripped 

After the Argenta lines tripped, the phase angle between eastern and western Michigan significantly 
increased. Hampton-Pontiac and Thetford-Jewell were the only lines connecting Detroit to the rest of the 
grid to the north and west. When these lines tripped out of service, it left the loads in Detroit, Toledo, 
Cleveland, and their surrounding areas served only by local generation and the lines north of Lake Erie 
connecting Detroit east to Ontario and the lines south of Lake Erie from Cleveland east to northwestern 
Pennsylvania. These trips completed the separation of the high voltage transmission system between 
eastern and western Michigan. 

Power system disturbance recorders at Keith and Lambton, Ontario, captured these events in the flows 
across the Ontario-Michigan interface, as shown in Figure IV. 14. The plots show that the west-to-east 
Michigan separation (culminating with the Thetford-Jewell trip), combined a fraction of a second later 
with the trip of the Erie West-Ashtabula-Perry 345-kV line connecting Ohio and Pennsylvania, was the 
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trigger for a sudden 3,700 MW power surge from Ontario into Michigan. When Thetford-Jewell tripped, 
power that had been flowing into Michigan and Ohio from western Michigan, western Ohio, and Indiana 
was cut off. The nearby Ontario recorders saw a pronounced impact as flows into Detroit readjusted to 
flow in from Ontario instead. 


On the boundary of northeastern Ohio and northwestern Pennsylvania, the Erie West-Ashtabula-Perry 
line was the last 345-kV link to the east for northern Ohio loads. When that line severed, all the power 
that moments before had flowed across Michigan and Ohio paths was now diverted in a counter¬ 
clockwise loop around Lake Erie through the single path left in eastern Michigan, pulling power out of 
Ontario, New York, and PJM. 
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Figure IV.14 — Flows on Keith-Waterman 230-kV Ontario-Michigan Tie Line 


Figure IV. 15a shows the results of modeling of line loadings on the Ohio, Michigan, and other regional 
interfaces for the period between 16:05:57 until the Thetford-Jewell trip; this helps to illustrate how 
power flows shifted during this period. Evolving system conditions were modeled for August 14, based 
on the 16:05:50 power flow case developed by the MAAC-ECAR-NPCC Operations Studies Working 
Group. Each horizontal line in the graph indicates a single or set of 345-kV lines and their loading as a 
function of normal ratings over time as first one, then another, set of circuits tripped out of service. In 
general, each subsequent line trip causes the remaining line loadings to rise. Note that Muskingum and 
East Lima-Fostoria Central were overloaded before they tripped, but the Michigan west and north 
interfaces were not overloaded before they tripped. Erie West-Ashtabula-Perry was loaded to 130 percent 
after the Hampton-Pontiac and Thetford-Jewell trips. 

The regional interface loadings graph (Figure VI. 15b) shows that loadings at the interfaces between PJM- 
New York, New York-Ontario, and New York-New England were well within normal ratings before the 
east-west Michigan separation. 
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jure IV.15a — Simulated 345-kV Line Loadings From 16:05:57 Through 16:10:38.6 
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Figure IV.15b — Simulated Regional Interface Loadings From 16:05:57 Through 

16:10:38.4 
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3. Large Counter-clockwise Power Surge around Lake Erie: 16:10:38.6 EDT 

The following key events occurred at 16:10:38: 

• 16:10:38.2: Hampton-Pontiac 345-kV line tripped 

• 16:10:38.4: Thetford-Jewell 345-kV line tripped 

• 16:10:38.6: Erie West-Ashtabula-Perry 345-kV line tripped at Perry 

• 16:10:38.6: Large power surge to serve loads in eastern Michigan and northern Ohio swept 
across Pennsylvania, New Jersey, and New York through Ontario into Michigan 

Perry-Ashtabula was the last 345-kV line connecting northern Ohio to the east along the southern shore of 
Lake Erie. This line tripped at the Perry substation on a zone 3 relay operation and separated the northern 
Ohio 345-kV transmission system from Pennsylvania and all 345-kV connections. After this trip, the 
load centers in eastern Michigan and northern Ohio (Detroit, Cleveland, and Akron) remained connected 
to the rest of the Eastern Interconnection only to the north of Lake Erie at the interface between the 
Michigan and Ontario systems (Figure IV. 16). Eastern Michigan and northern Ohio now had little 
internal generation left and voltage was declining. The frequency in the Cleveland area dropped rapidly, 
and between 16:10:39 and 16:10:50, underfrequency load shedding in the Cleveland area interrupted 
about 1,750 MW of load. However, the load shedding was not enough to reach a balance with local 
generation and arrest the frequency decline. The still-heavy loads in Detroit and Cleveland drew power 
over the only major transmission path remaining: the lines from eastern Michigan east into Ontario. 



Figure IV.16 — Michigan Lines Trip and Ohio Separates From Pennsylvania, 16:10:36 to 

16:10:38.6 

At 16:10:38.6, after the 345-kV transmission paths in Michigan and Ohio tripped, the power that had been 
flowing at modest levels into Michigan from Ontario suddenly jumped in magnitude. While flows from 
Ontario into Michigan had been in the 250 to 350 MW range since 16:10:09.06, with this new surge the 
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flows peaked at 3,700 MW at 16:10:39 (Figure IV. 17). Electricity moved along a giant loop from the rest 
of the Eastern Interconnection through Pennsylvania and into New York and Ontario, and then into 
Michigan via the remaining transmission path to serve the combined loads of Cleveland, Toledo, and 
Detroit (Figure IV. 18). This sudden large change in power flows lowered voltages and increased current 
levels on the transmission lines along the Pennsylvania-New York transmission interface. 

MW/Mvar KV 



Time - EDT 

Figure IV.17 — Real and Reactive Power and Voltage From Ontario Into Michigan 
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This power surge was of such a large magnitude that frequency was not the same across the Eastern 
Interconnection. The power swing resulted in a rapid rate of voltage decay. Flows into Detroit exceeded 
3,700 MW and 1,500 Mvar; the power surge was draining real power out of the northeast, causing 
voltages in Ontario and New York to drop. At the same time, local voltages in the Detroit area were 
plummeting because Detroit had already lost 500 MW of local generation. The electric system in the 
Detroit area would soon lose synchronism and black out (as evidenced by the rapid power oscillations 
decaying after 16:10:43). 

Just before the Argenta-Battle Creek trip, when Michigan separated west-to-east at 16:10:37, almost all of 
the generators in the Eastern Interconnection were operating in synchronism with the overall grid 
frequency of 60 Hertz, but when the large swing started, those machines began to swing dynamically. 
After the 345-kV line trip at 16:10:38, the Northeast entered a period of transient instability and loss of 
generator synchronicity. Between 16:10:38 and 16:10:41, the power swings caused a sudden localized 
increase in system frequency, hitting 60.7 Hz at Lambton and 60.4 Hz at Niagara. 

Because the demand for power in Michigan, Ohio, and Ontario was drawing on lines through New York 
and Pennsylvania, heavy power flows were moving northward from New Jersey over the New York tie 
lines to meet those power demands, exacerbating the power swing. Figure IV. 19 shows actual net line 
flows summed across the interfaces between the main regions affected by these swings: Ontario into 
Michigan, New York into Ontario, New York into New England, and PJM into New York. This shows 
that the power swings did not move in unison across every interface at every moment, but varied in 
magnitude and direction. This occurred for two reasons. First, the availability of lines across each 
interface varied over time, as did the amount of load that drew upon each interface, so net flows across 
each interface were not facing consistent demand with consistent capability as the cascade progressed. 
Second, the speed and magnitude of the swing was moderated by the inertia, reactive power capabilities, 
loading conditions, and locations of the generators across the entire region. 

After Cleveland was cut off from Pennsylvania and eastern power sources, Figure IV. 19 also shows the 
start of the dynamic power swing at 16:10:38.6. Because the loads of Cleveland, Toledo, and Detroit 
(less the load already blacked out) were now served through Michigan and Ontario, this forced a large 
shift in power flows to meet that demand. As noted above, flows from Ontario into Michigan increased 
from 1,000 MW to a peak of 3,700 MW shortly after the start of the swing, while flows from PJM into 
New York were close behind (Figure IV.20). But within one second after the peak of the swing, at 
16:10:40, flows reversed and flowed back from Michigan into Ontario at the same time that frequency at 
the interface dropped (Figure IV.21). The large load and imports into northern Ohio were losing 
synchronism with southeastern Michigan. Flows that had been westbound across the Ontario-Michigan 
interface by more than 3,700 MW at 16:10:38.8 reversed to 2,100 MW eastbound by 16:10:40, and then 
returned westbound starting at 16:10:40.5. 
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Figure IV. 19 — Measured Power Flows and Frequency Across Regional interfaces, 16:10:30 to 16:11:00, With Key Events in 

the Cascade 
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Figure IV.20 — Power Flows at 16:10:40 

Two 345-kV lines tripped because of zone 1 relay action along the border between PJM and the NYISO 
due to the transient overloads and depressed voltage. After the separation from PJM, the dynamic surges 
also drew power from New England and the Maritimes. The combination of the power surge and 
frequency rise caused 380 MW of pre-selected Maritimes generation to trip off-line due to the operation 
of the New Brunswick Power “Loss of Line 3001” Special Protection System. Although this system was 
designed to respond to failures of the 345-kV li nk between the Maritimes and New England, it operated in 
response to the effects of the power surge. The link remained intact during the event. 



Figure IV.21 — Power Flows at 16:10:41 
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4. Northern Ohio and Eastern Michigan Systems Degraded Further: 16:10:39 to 
16:10:46 EDT 

The following events occurred in northern Ohio and eastern Michigan over a period of seven seconds 
from 16:10:39 to 16:10:46: 

Line trips in Ohio and eastern Michigan: 

• 16:10:39.5: Bayshore-Monroe 345-kV line 

• 16:10:39.6: Allen Junction-Majestic-Monroe 345-kV line 

• 16:10:40.0: Majestic-Lemoyne 345-kV line 

• Majestic 345-kV Substation: one terminal opened sequentially on all 345-kV lines 

• 16:10:41.8: Fostoria Central-Galion 345-kV line 

• 16:10:41.911: Beaver-Davis Besse 345-kV line 

Underfrequency load shedding in Ohio: 

• FirstEnergy shed 1,754 MVA load 

• AEP shed 133 MVA load 

Six power plants, for a total of 3,097 MW of generation, tripped off-line in Ohio: 

• 16:10:42: Bay Shore Units 1-4 (551 MW near Toledo) tripped on over-excitation 

• 16:10:40: Lakeshore unit 18 (156 MW, near Cleveland) tripped on underfrequency 

• 16:10:41.7: Eastlake 1, 2, and 3 units (304 MW total, near Cleveland) tripped on under- 
frequency 

• 16:10:41.7: Avon Lake unit 9 (580 MW, near Cleveland) tripped on underfrequency 

• 16:10:41.7: Perry 1 nuclear unit (1,223 MW, near Cleveland) tripped on underfrequency 

• 16:10:42: Ashtabula unit 5 (184 MW, near Cleveland) tripped on underfrequency 

Five power plants producing 1,630 MW tripped off-line near Detroit: 

• 16:10:42: Greenwood unit 1 tripped (253 MW) on low voltage, high current 

• 16:10:41: Belle River unit 1 tripped (637 MW) on out-of-step 

• 16:10:41: St. Clair unit 7 tripped (221 MW, DTE unit) on high voltage 

• 16:10:42: Trenton Channel units 7A, 8, and 9 tripped (648 MW) 

• 16:10:43: West Lorain units (296 MW) tripped on under-voltage 

In northern Ohio, the trips of the Bay Shore-Monroe, Majestic-Lemoyne, Allen Junction-Majestic- 
Monroe 345-kV lines, and the Ashtabula 345/138-kV transformer cut off Toledo and Cleveland from the 
north, turning that area into an electrical island (Figure IV.22). After these 345-kV line trips, the high 
power imports from southeastern Michigan into Ohio suddenly stopped at 16:10:40. Frequency in this 
island began to fall rapidly. This caused a series of power plants in the area to trip off-line due to the 
operation of underfrequency relays, including the Bay Shore units. Cleveland area load was disconnected 
by automatic underfrequency load shedding (approximately 1,300 MW), and another 434 MW of load 
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was interrupted after the generation remaining within this transmission island was tripped by 
underffequency relays. This sudden load drop would contribute to the reverse power swing described 
previously. In its own island, portions of Toledo blacked out from automatic underfrequency load 
shedding but most of the Toledo load was restored by automatic reclosing of lines such as the East Lima- 
Fostoria Central 345-kV line and several lines at the Majestic 345-kV substation. 



Figure IV.22 — Cleveland and Toledo Islanded, 16:10:39 to 16:10:46 


The Perry nuclear plant is located in Ohio on Lake Erie, not far from the Pennsylvania border. The Perry 
plant was inside the decaying electrical island and tripped soon thereafter on underfrequency, as designed. 
A number of other units near Cleveland tripped off-line by underfrequency protection. Voltage in the 
island dropped, causing the Beaver-Davis Besse 345-kV line between Cleveland and Toledo to trip. This 
marked the end for Cleveland, which could not sustain itself as a separate island. However, by separating 
from Cleveland, Toledo was able to resynchronize with the rest of the eastern inter-connection once the 
phase angle across the open East Lima-Fostoria 345-kV line came back within its limits and re-closed. 

The large power surge into Michigan, beginning at 16:10:38, occurred when Toledo and Cleveland were 
still connected to the grid through Detroit. After the Bayshore-Monroe line tripped at 16:10:39, Toledo 
and Cleveland separated into their own island, dropping a large amount of load off of the Detroit system. 
This suddenly left Detroit with excess generation, much of which greatly accelerated in angle as the 
depressed voltage in Detroit (caused by the high demand in Cleveland) caused the Detroit units to begin 
to pull out of step with the rest of the grid. When voltage in Detroit returned to near-normal, the 
generators could not sufficiently decelerate to remain synchronous. This out-of-step condition is evident 
in Figure IV.23, which shows at least two sets of generator “pole slips” by plants in the Detroit area 
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between 16:10:40 and 16:10:42. Several large units around Detroit: Belle River, St. Clair, Greenwood, 
Monroe, and Fermi Nuclear all tripped in response. After the Cleveland-Toledo island formed at 
16:10:40, Detroit frequency spiked to almost 61.7 Hz before dropping, momentarily equalized between 
the Detroit and Ontario systems. But Detroit frequency then began to decay at 2 Hz/sec and the 
generators experienced under-speed conditions. 
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Figure IV.23 — Generators Under Stress in Detroit, as Seen From Keith PSDR 
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Re-examination of Figure IV. 19 shows the power swing from the northeast through Ontario into 
Michigan and northern Ohio that began at 16:10:37, and how it reverses and swings back around Lake 
Erie at 16:10:39. That return was caused by the combination of natural oscillation accelerated by major 
load losses, as the northern Ohio system disconnected from Michigan. It caused a power flow change of 
5,800 MW, from 3,700 MW westbound to 2,100 eastbound across the Ontario-to-Michigan border 
between 16:10:39.5 and 16:10:40. Since the system was now fully dynamic, this large oscillation 
eastbound would lead naturally to a rebound, which began at 16:10:40 with an inflection point reflecting 
generation shifts between Michigan and Ontario and additional line losses in Michigan. 

5. Western Pennsylvania-New York Separation: 16:10:39 to 16:10:44 EDT 

The following events occurred over a five-second period from 16:10:39 to 16:10:44, beginning the 
separation of New York and Pennsylvania: 

• 16:10:39: Homer City-Watercure Road 345-kV 

• 16:10:39: Homer City-Stolle Road 345-kV 

• 16:10:44: South Ripley-Erie East 230-kV, and South Ripley-Dunkirk 230-kV 

• 16:10:44: East Towanda-Hillside 230-kV 


Responding to the swing of power out of Michigan toward Ontario and into New York and PJM, zone 1 
relays on the 345-kV lines separated Pennsylvania from New York (Figure IV.24). Homer City- 
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Watercure (177 miles) and Homer City-Stolle Road (207 miles) are relatively long lines with high 
impedances. Zone 1 relays do not have timers, and therefore operate nearly instantly when a power swing 
enters the relay target circle. For normal length lines, zone 1 relays have smaller target circles because 
the relay is measuring less than the full length of the line, but for a long line the greater impedance 
enlarges the relay target circle and makes it more likely to be hit by the power swing. The Homer City- 
Watercure and Homer City-Stolle Road lines do not have zone 3 relays. 

Given the length and impedance of these lines, it was highly likely that they would trip and separate in the 
face of such large power swings. Most of the other interfaces between regions have shorter ties. For 
instance, the ties between New York and Ontario and Ontario to Michigan are only about two miles long, 
so they are electrically very short and thus have much lower impedance and trip less easily than these 
long lines. A zone 1 relay target on a short line covers a small distance so a power swing is less likely to 
enter the relay target circle at all, averting a zone 1 trip. 



Figure IV.24 — Western Pennsylvania Separates From New York, 

16:10:39 to 16:10:44 


At 16:10:44, (see Figure IV.25) the northern part of the Eastern Interconnection (including eastern 
Michigan) was connected to the rest of the Interconnection at only two locations: (1) in the east through 
the 500-kV and 230-kV ties between New York and northeastern New Jersey, and (2) in the west through 
the long and electrically fragile 230-kV transmission path connecting Ontario to Manitoba and Minnesota. 
The separation of New York from Pennsylvania (leaving only the lines from New Jersey into New York 
connecting PJM to the northeast) helped to buffer PJM in part from these swings. Frequency was high in 
Ontario at that point, indicating that there was more generation than load, so much of this flow reversal 
never got past Ontario into New York. 
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Figure IV.25 — Power Flows at 16:10:44 

6. Final Separation of the Northeast from the Eastern Interconnection: 16:10:43 
to 16:10:45 EDT 

The following line trips between 16:10:43 to 16:10:45 resulted in the northeastern United States and 
eastern Canada becoming an electrical island completely separated from the rest of the Eastern 
Interconnection: 

• 16:10:43: Keith-Waterman 230-kV line tripped 

• 16:10:45: Wawa-Marathon 230-kV lines tripped 

• 16:10:45: Branchburg-Ramapo 500-kV line tripped 

At 16:10:43, eastern Michigan was still connected to Ontario, but the Keith-Waterman line that forms part 
of that interface disconnected due to apparent impedance. This put more power onto the remaining 
interface between Ontario and Michigan, but triggered sustained oscillations in both power flow and 
frequency along the remaining 230-kV line. 

At 16:10:45, northwest Ontario separated from the rest of Ontario when the Wawa-Marathon 230-kV 
lines (168 km long) disconnected along the northern shore of Lake Superior, tripped by zone 1 distance 
relays at both ends. This separation left the loads in the far northwest portion of Ontario connected to the 
Manitoba and Minnesota systems, and protected them from the blackout (Figure IV.26). 
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Figure IV.26 — Northeast Separates From Eastern Interconnection, 16:10:45 
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As shown in Figure IV.27, the 69-mile long Branchburg-Ramapo line and Ramapo transformer between 
New Jersey and New York was the last major transmission path remaining between the Eastern 
Interconnection and the area ultimately affected by the blackout. Figure IV.28 shows how that line 
disconnected at 16:10:45, along with other underlying 230 and 138-kV lines in northeastern New Jersey. 
Branchburg-Ramapo was carrying over 3,000 MVA and 4,500 amps with voltage at 79 percent before it 
tripped, either on a high-speed swing into zone 1 or on a direct transfer trip. The investigation team is 
still examining why the higher impedance 230-kV overhead lines tripped while the underground Hudson- 
Farragut 230-kV cables did not; the available data suggest that the lower impedance of underground 
cables made these less vulnerable to the electrical strain placed on the system. 
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Figure IV.28 — PJM to New York Ties Disconnect 

This left the northeast portion of New Jersey connected to New York, while Pennsylvania and the rest of 
New Jersey remained connected to the rest of the Eastern Interconnection. Within northeastern New 
Jersey, the separation occurred along the 230-kV corridors, which are the main supply feeds into the 
northern New Jersey area (the two Roseland-Athenia circuits and the Finden-Bayway circuit). These 
circuits supply the large customer load in northern New Jersey and are a primary route for power transfers 
into New York City, so they are usually more highly loaded than other interfaces. These lines tripped 
west and south of the large customer loads in northeast New Jersey. 

The separation of New York, Ontario, and New England from the rest of the Eastern Interconnection 
occurred due to natural breaks in the system and automatic relay operations, which performed exactly as 
designed. No human intervention occurred by any operators to affect this split. At this point, the Eastern 
Interconnection was divided into two major sections. To the north and east of the separation point lay 
New York City, northern New Jersey, New York state, New England, the Canadian Maritimes provinces, 
eastern Michigan, the majority of Ontario, and the Quebec system. 

The rest of the Eastern Interconnection, to the south and west of the separation boundary, was not 
seriously affected by the blackout. Approximately 3,700 MW of excess generation in the main portion of 
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the Eastern Interconnection that was on-line to export into the Northeast was now separated from the load 
it had been serving. This left the northeastern island with even less in-island generation on-line as it 
attempted to stabilize during the final phase of the cascade. 

E. Electrical Islands Seek Equilibrium: 16:10:46 to 16:12 EDT 

1. Overview 

During the next three seconds, the islanded northern section of the Eastern Interconnection broke apart 
internally. 

• New York-New England upstate transmission lines disconnected: 16:10:46 to 16:10:47 

• New York transmission system split along Total East interface: 16:10:49 

• The Ontario system just west of Niagara Falls and west of St. Lawrence separated from the 
western New York island: 16:10:50 

Figure VI.29 illustrates the events of this phase. 



Figure IV.29 — New York and New England Separate, Multiple Islands Form 

A half minute later, two more separations occur: 

• Southwestern Connecticut separates from New York City: 16:11:22 

• Remaining transmission lines between Ontario and eastern Michigan separate: 16:11:57 
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By this point, most portions of the affected area were blacked out. This last phase of the cascade is 
principally about the search for balance between loads and generation in the various islands that have 
formed. The primary mechanism for reaching that balance was underfrequency load shedding (UFLS). 

The following UFLS operated on the afternoon of August 14: 

• Ohio shed over 1,883 MW beginning at 16:10:39 

• Michigan shed a total of 2,835 MW 

• New York shed a total of 10,648 MW in several steps, beginning at 16:10:48 

• PJM shed a total of 1,324 MW in three steps in northern New Jersey, beginning at 16:10:48 

• New England shed a total of 1,098 MW 

The entire northeastern system was experiencing large scale, dynamic oscillations during this period. 

Even if the UFLS and generation had been perfectly balanced at any moment in time, these oscillations 
would have made stabilization difficult and unlikely. Figure IV. 30 gives an overview of the power flows 
and frequencies during the period 16:10:45 through 16:11:00, capturing most of the key events in the final 
phase of the cascade. 


July 13, 2004 


85 



Power Flows (MW) 


August 14, 2003, Blackout 
Final NERC Report 


Section IV 

Cascading Failure of the Power System 


16 : 10 : 45.2 16 : 10:49 16 : 10:56 16 : 11:10 



Figure IV.30 — Measured Power Flows and Frequency Across Regional Interfaces, 16:10:45 to 16:11:30, With Key Events in 

the Cascade 
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After the blackout of 1965, the utilities serving New York City and neighboring northern New Jersey 
increased the integration between the systems serving this area to increase the flow capability into New 
York and improve the reliability of the system as a whole. The combination of the facilities in place and 
the pattern of electrical loads and flows on August 14 caused New York to be tightly linked electrically to 
northern New Jersey and southwestern Connecticut, and moved previously existing weak spots on the 
grid out past this combined load and network area. 

2. New York-New England Separation: 16:10:46 to 16:10:54 EDT 

Prior to New England’s separation from the Eastern Interconnection at approximately 16:11, voltages 
became depressed due to the large power swings occurring across the interconnection while trying to feed 
the collapsing areas to the west. Immediately following the separation of New England and the 
Maritimes from the Eastern Interconnection, the Connecticut transmission system voltages went high. 

This was the result of capacitors remaining in service, load loss, reduced reactive losses on transmission 
circuits, and loss of generation to regulate the system voltage. Overvoltage protective relays operated, 
tripping both transmission and distribution capacitors across the Connecticut system. In addition, the load 
in the area of Connecticut that was still energized began to increase during the first 7-10 minutes 
following the initial separation as loads reconnected. This increase in load was most likely due to 
customers restoring process load, which tripped during transient instability. The load increase combined 
with the capacitors tripping resulted in the transmission voltages going from high to low within 
approximately five minutes. To stabilize the system. New England operators ordered all fast start 
generation by 16:16 and took decisive action to manually drop approximately 80 MW of load in 
southwest Connecticut by 16:39. They dropped another 325 MW in Connecticut and 100 MW in western 
Massachusetts by 16:40. These measures helped to stabilize the New England and Maritime island 
following their separation from the rest of the Eastern Interconnection. 

Between 16:10:46 and 16:10:54, the separation between New England and New York occurred along five 
northern ties and seven ties within southwestern Connecticut. At the time of the east-west separation in 
New York at 16:10:49, New England was isolated from the eastern New York island. The only 
remaining tie was the PV-20 circuit connecting New England and the western New York island, which 
tripped at 16:10:54. Because New England was exporting to New York before the disturbance across the 
southwestern Connecticut tie, but importing on the Norwalk-Northport tie, the Pleasant Valley path 
opened east of Long Mountain (in other words, internal to southwestern Connecticut) rather than along 
the actual New York-New England tie. Immediately before the separation, the power swing out of New 
England occurred because the New England generators had increased output in response to the drag of 
power through Ontario and New York into Michigan and Ohio. The power swings continuing through 
the region caused this separation and caused Vermont to lose approximately 70 MW of load. 

When the ties between New York and New England disconnected, most of New England, along with the 
Maritime provinces of New Brunswick and Nova Scotia, became an island with generation and demand 
balanced sufficiently close that it was able to remain operational. The New England system had been 
exporting close to 600 MW to New York, so it was relatively generation-rich and experienced continuing 
fluctuations until it reached equilibrium. Before the Maritimes and New England separated from the 
Eastern Interconnection at approximately 16:11, voltages became depressed across portions of New 
England and some large customers disconnected themselves automatically. However, southwestern 
Connecticut separated from New England and remained tied to the New York system for about one 
minute. 

While frequency within New England fluctuated slightly and recovered quickly after 16:10:40, frequency 
in the New York-Ontario-Michigan-Ohio island varied severely as additional lines, loads, and generators 
tripped, reflecting the magnitude of the generation deficiency in Michigan and Ohio. 
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Due to its geography and electrical characteristics, the Quebec system in Canada is tied to the remainder 
of the Eastern Interconnection via high-voltage DC li nks instead of AC transmission lines. Quebec was 
able to survive the power surges with only small impacts because the DC connections shielded it from the 
frequency swings. At the same time, the DC ties into upper New York and New England served as 
resources to stabilize those two islands and helped keep them energized during the cascade. 

3. New York Transmission Split East-West: 16:10:49 EDT 

The transmission system split internally within New York along the Total East interface, with the eastern 
portion islanding to contain New York City, northern New Jersey, and southwestern Connecticut. The 
eastern New York island had been importing energy, so it did not have enough surviving generation on¬ 
line to balance load. Frequency declined quickly to below 58.0 Hz and triggered 7,115 MW of automatic 
UFLS. Frequency declined further, as did voltage, causing pre-designed trips at the Indian Point nuclear 
plant and other generators in and around New York City through 16:11:10. New York’s Total East and 
Central East interfaces, where the New York internal split occurred, are routinely among the most heavily 
loaded paths in the state and are operated under thermal, voltage, and stability limits to respect their 
relative vulnerability and importance. 

Examination of the loads and generation in the eastern New York island indicates that before 16:10:00, 
the area had been importing electricity and had less generation on-line than load. At 16:10:50, seconds 
after the separation along the Total East interface, the eastern New York area had experienced significant 
load reductions due to UFLS — Consolidated Edison, which serves New York City and surrounding 
areas, dropped more than 40 percent of its load on automatic UFLS. But at this time, the system was still 
experiencing dynamic conditions; as illustrated in Figure IV.31, frequency was falling, flows and voltages 
were oscillating, and power plants were tripping off-line. Had there been a slow islanding situation and 
more generation on-line, it might have been possible for the Eastern New York island to rebalance given 
its high level of UFLS. However, events happened so quickly and the power swings were so large that 
rebalancing would have been unlikely, with or without the northern New Jersey and southwestern 
Connecticut loads hanging onto eastern New York. This was further complicated because the high rate of 
change in voltages at load buses reduced the actual levels of load shed by UFLS relative to the levels 
needed and expected. 

4. Western New York-Ontario Interface 

The Ontario system separated from the western New York island just west of Niagara Falls and west of 
St. Lawrence at 16:10:50. This separation was due to relay operations that disconnected nine 230-kV 
lines within Ontario. These left most of Ontario isolated. Ontario’s large Beck and Saunders hydro 
stations, along with some Ontario load, the NYPA Niagara and St. Lawrence hydro stations, and NYPA’s 
765-kV AC interconnection to their HVDC tie with Quebec, remained connected to the western New 
York system, supporting the demand in upstate New York. From 16:10:49 to 16:10:50, frequency in 
Ontario declined below 59.3 Hz, initiating automatic UFLS (3,000 MW). This load shedding dropped 
about 12 percent of Ontario’s remaining load. Between 16:10:50 and 16:10:56, the isolation of Ontario’s 
2,300 MW Beck and Saunders hydro units onto the western New York island, coupled with UFLS, 
caused the frequency in this island to rise to 63.4 Hz due to excess generation relative to the load 
remaining within the island. This is shown in Figure VI.31. The high frequency caused trips of five of 
the U.S. nuclear units within the island, and the last one tripped on the second frequency rise. 
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Figure IV.31 — Separation of Ontario and Western New York 


Three of the 230-kV transmission circuits reclosed near Niagara automatically to reconnect Ontario to 
New York at 16:10:56. Even with these lines reconnected, the main Ontario island (still attached to New 
York and eastern Michigan) was extremely deficient in generation, so its frequency declined towards 58.8 
Hz, the threshold for the second stage of UFLS. Over the next two seconds, another 19 percent of Ontario 
demand (4,800 MW) automatically disconnected by UFLS. At 16:11:10, these same three lines tripped a 
second time west of Niagara, and New York and most of Ontario separated for a final time. Following 
this separation, the frequency in Ontario declined to 56 Hz by 16:11:57. With Ontario still supplying 
2,500 MW to the Michigan-Ohio load pocket, the remaining ties with Michigan tripped at 16:11:57. 
Ontario system frequency declined, leading to a widespread shutdown at 16:11:58 and a loss of 22,500 
MW of load in Ontario, including the cities of Toronto, Hamilton, and Ottawa. 

5. Southwest Connecticut Separated From New York City: 16:11:22 EDT 

In southwestern Connecticut, when the Long Mountain-Plum Tree line (connected to the Pleasant Valley 
substation in New York) disconnected at 16:11:22, it left about 500 MW of demand supplied only 
through a 138-kV underwater tie to Long Island. About two seconds later, the two 345-kV circuits 
connecting southeastern New York to Long Island tripped, isolating Long Island and southwestern 
Connecticut, which remained tied together by the underwater Norwalk Harbor to Northport 138-kV cable. 
The cable tripped about 20 seconds later, causing southwestern Connecticut to black out. 

6. Western New York Stabilizes 


Within the western New York island, the 345-kV system remained intact from Niagara east to the Utica 
area, and from the St. Lawrence/Plattsburgh area south to the Utica area through both the 765-kV and 
230-kV circuits. Ontario’s Beck and Saunders generation remained connected to New York at Niagara 
and St. Lawrence, respectively, and this island stabilized with about 50 percent of the pre-event load 
remaining. The boundary of this island moved southeastward as a result of the reclosure of Fraser-to- 
Coopers Comers 345-kV at 16:11:23. 
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As a result of the severe frequency and voltage changes, many large generating units in New York and 
Ontario tripped off-line. The eastern island of New York, including the heavily populated areas of 
southeastern New York, New York City, and Long Island, experienced severe frequency and voltage 
decline. At 16:11:29, the New Scotland-to-Leeds 345-kV circuits tripped, separating the eastern New 
York island into northern and southern sections. The small remaining load in the northern portion of the 
eastern island (the Albany area) retained electric service, supplied by local generation until it could be 
resynchronized with the western New York island. The southern island, including New York City, 
rapidly collapsed into a blackout. 

8. Remaining Transmission Lines Between Ontario and Eastern Michigan 
Separate: 16:11:57 EDT 

Before the blackout, New England, New York, Ontario, eastern Michigan, and northern Ohio were 
scheduled net importers of power. When the western and southern lines serving Cleveland, Toledo, and 
Detroit collapsed, most of the load remained on those systems, but some generation had tripped. This 
exacerbated the generation/load imbalance in areas that were already importing power. The power to 
serve this load came through the only major path available, via Ontario. After most of IMO was 
separated from New York and generation to the north and east, much of the Ontario load and generation 
was lost; it took only moments for the transmission paths west from Ontario to Michigan to fail. 

When the cascade was over at about 16:12, much of the disturbed area was completely blacked out, but 
isolated pockets still had service because load and generation had reached equilibrium. Ontario’s large 
Beck and Saunders hydro stations, along with some Ontario load, the NYPA Niagara and St. Lawrence 
hydro stations, and NYPA’s 765-kV AC interconnection to the Quebec HVDC tie, remained connected to 
the western New York system, supporting demand in upstate New York. 

Figure IV.32 shows frequency data collected by the distribution-level monitors of Softswitching 
Technologies, Inc. (a commercial power quality company serving industrial customers) for the area 
affected by the blackout. The data reveal at least five separate electrical islands in the Northeast as the 
cascade progressed. The two paths of red circles on the frequency scale reflect the Albany area island 
(upper path) versus the New York City island, which declined and blacked out much earlier. 


July 13, 2004 


90 



August 14, 2003, Blackout 
Final NERC Report 


Section IV 

Cascading Failure of the Power System 


65 


60 


d> 

I 

> „ 
o 55 

•D 

3 

or 

<D 

L. 

LL 

50 


45 

16:10:30 16:11:30 16:12:30 16:13:30 

Time ■ EDT 

Figure IV.32 — Electric Islands Reflected in Frequency Plot 
9. Cascading Sequence Essentially Complete: 16:13 EDT 

Most of the Northeast (the area shown in gray in Figure VI.33) had now blacked out. Some isolated areas 
of generation and load remained on-line for several minutes. Some of those areas in which a close 
generation-demand balance could be maintained remained operational. 
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Figure IV.33 — Areas Affected by the Blackout 


One relatively large island remained in operation, serving about 5,700 MW of demand, mostly in western 
New York. Ontario’s large Beck and Saunders hydro stations, along with some Ontario load, the NYPA 
Niagara and St. Lawrence hydro stations, and NYPA’s 765-kV AC interconnection with Quebec, 
remained connected to the western New York system, supporting demand in upstate New York. This 
island formed the basis for restoration in both New York and Ontario. 

The entire cascade sequence is summarized graphically in figure VI.34. 
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Figure IV.34 — Cascade Sequence Summary 


July 13, 2004 


93 









August 14, 2003, Blackout 
Final NERC Report 


Section V 

Conclusions and Recommendations 


V. Conclusions and Recommendations 

A. General Conclusions 

The August 14 blackout had many similarities with previous large-scale blackouts, including the 1965 
Northeast blackout that was the basis for forming NERC in 1968, and the July 1996 outages in the West. 
Common factors include: conductor contacts with trees, inability of system operators to visualize events 
on the system, failure to operate within known safe limits, ineffective operational communications and 
coordination, inadequate training of operators to recognize and respond to system emergencies, and 
inadequate reactive power resources. 

The general conclusions of the NERC investigation are as follows: 

• Several entities violated NERC operating policies and planning standards, and those violations 
contributed directly to the start of the cascading blackout. 

• The approach used to monitor and ensure compliance with NERC and regional reliability 
standards was inadequate to identify and resolve specific compliance violations before those 
violations led to a cascading blackout. 

• Reliability coordinators and control areas have adopted differing interpretations of the functions, 
responsibilities, authorities, and capabilities needed to operate a reliable power system. 

• In some regions, data used to model loads and generators were inaccurate due to a lack of 
verification through benchmarking with actual system data and field testing. 

• Planning studies, design assumptions, and facilities ratings were not consistently shared and were 
not subject to adequate peer review among operating entities and regions. 

• Available system protection technologies were not consistently applied to optimize the ability to 
slow or stop an uncontrolled cascading failure of the power system. 

• Deficiencies identified in studies of prior large-scale blackouts were repeated, including poor 
vegetation management, operator training practices, and a lack of adequate tools that allow 
operators to visualize system conditions. 

B. Causal Analysis Results 

This section summarizes the causes of the blackout. Investigators found that the Sammis-Star 345-kV 
line trip was a seminal event, after which power system failures began to spread beyond northeastern 
Ohio to affect other areas. After the Sammis-Star line outage at 16:05:57, the accelerating cascade of line 
and generator outages would have been difficult or impossible to stop with installed protection and 
controls. Therefore, the causes of the blackout are focused on problems that occurred before the Sammis- 
Star outage. 

The causes of the blackout described here did not result from inanimate events, such as “the alarm 
processor failed” or “a tree contacted a power line.” Rather, the causes of the blackout were rooted in 
deficiencies resulting from decisions, actions, and the failure to act of the individuals, groups, and 
organizations involved. These causes were preventable prior to August 14 and are correctable. Simply 
put — blaming a tree for contacting a line serves no useful purpose. The responsibility lies with the 
organizations and persons charged with establishing and implementing an effective vegetation 
management program to maintain safe clearances between vegetation and energized conductors. 
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Each cause identified here was verified to have existed on August 14 prior to the blackout. Each cause 
was also determined to be both a necessary condition to the blackout occurring and, in conjunction with 
the other causes, sufficient to cause the blackout. In other words, each cause was a direct link in the 
causal chain leading to the blackout and the absence of any one of these causes could have broken that 
chain and prevented the blackout. This definition distinguishes causes as a subset of a broader category 
of identified deficiencies. Other deficiencies are noted in the next section; they may have been 
contributing factors leading to the blackout or may present serious reliability concerns completely 
unrelated to the blackout, but they were not deemed by the investigators to be direct causes of the 
blackout. They are still important; however, because they might have caused a blackout under a different 
set of circumstances. 

1. Causes of the Blackout 

Group 1 Causes: FE lacked situational awareness of line outages and degraded conditions on the FE 
system. The first five causes listed below collectively resulted in a lack of awareness by the FE system 
operators that line outages were occurring on the FE system and that operating limit violations existed 
after the trip of the Chamberlin-Harding line at 15:05 and worsened with subsequent line trips. This lack 
of situational awareness precluded the FE system operators from taking corrective actions to return the 
system to within limits, and from notifying MISO and neighboring systems of the degraded system 
conditions and loss of critical functionality in the control center. 

Cause la: FE had no alarm failure detection system. Although the FE alarm processor stopped 
functioning properly at 14:14, the computer support staff remained unaware of this failure until the 
second EMS server failed at 14:54, some 40 minutes later. Even at 14:54, the responding support 
staff understood only that all of the functions normally hosted by server H4 had failed, and did not 
realize that the alarm processor had failed 40 minutes earlier. Because FE had no periodic diagnostics 
to evaluate and report the state of the alarm processor, nothing about the eventual failure of two EMS 
servers would have directly alerted the support staff that the alarms had failed in an infinite loop 
lockup — or that the alarm processor had failed in this manner both earlier and independently of the 
server failure events. Even if the FE computer support staff had communicated the EMS failure to 
the operators (which they did not) and fully tested the critical functions after restoring the EMS 
(which they did not), there still would have been a minimum of 40 minutes, from 14:14 to 14:54, 
during which the support staff was unaware of the alarm processor failure. 

Cause lb: FE computer support staff did not effectively communicate the loss of alarm 
functionality to the FE system operators after the alarm processor failed at 14:14, nor did they 
have a formal procedure to do so. Knowing the alarm processor had failed would have provided FE 
operators the opportunity to detect the Chamberlin-Harding line outage shortly after 15:05 using 
supervisory displays still available in their energy management system. Knowledge of the 
Chamberlin-Harding line outage would have enabled FE operators to recognize worsening conditions 
on the FE system and to consider manually reclosing the Chamberlin-Harding line as an emergency 
action after subsequent outages of the Hanna-Juniper and Star-South Canton 345-kV lines. 

Knowledge of the alarm processor failure would have allowed the FE operators to be more receptive 
to information being received from MISO and neighboring systems regarding degrading conditions 
on the FE system. This knowledge would also have allowed FE operators to warn MISO and 
neighboring systems of the loss of a critical monitoring function in the FE control center computers, 
putting them on alert to more closely monitor conditions on the FE system, although there is not a 
specific procedure requiring FE to warn MISO of a loss of a critical control center function. The FE 
operators were complicit in this deficiency by not recognizing the alarm processor failure existed, 
although no new alarms were received by the operators after 14:14. A period of more than 90 
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minutes elapsed before the operators began to suspect a loss of the alarm processor, a period in which, 
on a typical day, scores of routine alarms would be expected to print to the alarm logger. 

Cause lc: FE control center computer support staff did not fully test the functionality of 
applications, including the alarm processor, after a server failover and restore. After the FE 
computer support staff conducted a warm reboot of the energy management system to get the failed 
servers operating again, they did not conduct a sufficiently rigorous test of critical energy 
management system applications to determine that the alarm processor failure still existed. Full 
testing of all critical energy management functions after restoring the servers would have detected the 
alarm processor failure as early as 15:08 and would have cued the FE system operators to use an 
alternate means to monitor system conditions. Knowledge that the alarm processor was still failed 
after the server was restored would have enabled FE operators to proactively monitor system 
conditions, become aware of the line outages occurring on the system, and act on operational 
information that was received. Knowledge of the alarm processor failure would also have allowed FE 
operators to warn MISO and neighboring systems, assuming there was a procedure to do so, of the 
loss of a critical monitoring function in the FE control center computers, putting them on alert to 
more closely monitor conditions on the FE system. 

Cause Id: FE operators did not have an effective alternative to easily visualize the overall 
conditions of the system once the alarm processor failed. An alternative means of readily 
visualizing overall system conditions, including the status of critical facilities, would have enabled FE 
operators to become aware of forced line outages in a timely manner even though the alarms were 
non-functional. Typically, a dynamic map board or other type of display could provide a system 
status overview for quick and easy recognition by the operators. As with the prior causes, this 
deficiency precluded FE operators from detecting the degrading system conditions, taking corrective 
actions, and alerting MISO and neighboring systems. 

Cause le: FE did not have an effective contingency analysis capability cycling periodically on-line 
and did not have a practice of running contingency analysis manually as an effective alternative 
for identifying contingency limit violations. Real-time contingency analysis, cycling automatically 
every 5-15 minutes, would have alerted the FE operators to degraded system conditions following the 
loss of the Eastlake 5 generating unit and the Chamberlin-Harding 345-kV line. Initiating a manual 
contingency analysis after the trip of the Chamberlin-Harding line could also have identified the 
degraded system conditions for the FE operators. Knowledge of a contingency limit violation after 
the loss of Chamberlin-Harding and knowledge that conditions continued to worsen with the 
subsequent line losses would have allowed the FE operators to take corrective actions and notify 
MISO and neighboring systems of the developing system emergency. FE was operating after the trip 
of the Chamberlin-Harding 345-kV line at 15:05, such that the loss of the Perry 1 nuclear unit would 
have caused one or more lines to exceed their emergency ratings. 

Group 2 Cause: FE did not effectively manage vegetation in its transmission rights-of-way. 

Cause 2: FE did not effectively manage vegetation in its transmission line rights-of-way. The lack 
of situational awareness resulting from Causes la-le would have allowed a number of system failure 
modes to go undetected. However, it was the fact that FE allowed trees growing in its 345-kV 
transmission rights-of-way to encroach within the minimum safe clearances from energized 
conductors that caused the Chamberlin-Harding, Hanna-Juniper, and Star-South Canton 345-kV line 
outages. These three tree-related outages triggered the localized cascade of the Cleveland-Akron 
138-kV system and the overloading and tripping of the Sammis-Star line, eventually snowballing into 
an uncontrolled wide-area cascade. These three lines experienced non-random, common mode 
failures due to unchecked tree growth. With properly cleared rights-of-way and calm weather, such 
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as existed in Ohio on August 14, the chances of those three lines randomly tripping within 30 minutes 
is extremely small. Effective vegetation management practices would have avoided this particular 
sequence of line outages that triggered the blackout. However, effective vegetation management 
might not have precluded other latent failure modes. For example, investigators determined that there 
was an elevated risk of a voltage collapse in the Cleveland-Akron area on August 14 if the Perry 1 
nuclear plant had tripped that afternoon in addition to Eastlake 5, because the transmission system in 
the Cleveland-Akron area was being operated with low bus voltages and insufficient reactive power 
margins to remain stable following the loss of Perry 1. 

Group 3 Causes: Reliability coordinators did not provide effective diagnostic support. 

Cause 3a: MISO was using non-real-time information to monitor real-time operations in its area 
of responsibility. MISO was using its Flowgate Monitoring Tool (FMT) as an alternative method of 
observing the real-time status of critical facilities within its area of responsibility. However, the FMT 
was receiving information on facility outages from the NERC SDX, which is not intended as a real¬ 
time information system and is not required to be updated in real-time. Therefore, without real-time 
outage information, the MISO FMT was unable to accurately estimate real-time conditions within the 
MISO area of responsibility. If the FMT had received accurate line outage distribution factors 
representing current system topology, it would have identified a contingency overload on the Star- 
Juniper 345-kV line for the loss of the Hanna-Juniper 345-kV line as early as 15:10. This information 
would have enabled MISO to alert FE operators regarding the contingency violation and would have 
allowed corrective actions by FE and MISO. The reliance on non-real-time facility status information 
from the NERC SDX is not limited to MISO; others in the Eastern Interconnection use the same SDX 
information to calculate TER curtailments in the IDC and make operational decisions on that basis. 
What was unique compared to other reliability coordinators on that day was MISO’s reliance on the 
SDX for what they intended to be a real-time system monitoring tool. 

Cause 3b: MISO did not have real-time topology information for critical lines mapped into its state 
estimator. The MISO state estimator and network analysis tools were still considered to be in 
development on August 14 and were not fully capable of automatically recognizing changes in the 
configuration of the modeled system. Following the trip of lines in the Cinergy system at 12:12 and 
the DP&L Stuart-Atlanta line at 14:02, the MISO state estimator failed to solve correctly as a result of 
large numerical mismatches. MISO real-time contingency analysis, which operates only if the state 
estimator solves, did not operate properly in automatic mode again until after the blackout. Without 
real-time contingency analysis information, the MISO operators did not detect that the FE system was 
in a contingency violation after the Chamberlin-Harding 345-kV line tripped at 15:05. Since MISO 
was not aware of the contingency violation, MISO did not inform FE and thus FE’s lack of situational 
awareness described in Causes la-e was allowed to continue. With an operational state estimator and 
real-time contingency analysis, MISO operators would have known of the contingency violation and 
could have informed FE, thus enabling FE and MISO to take timely actions to return the system to 
within limits. 

Cause 3c: The PJM and MISO reliability coordinators lacked an effective procedure on when and 
how to coordinate an operating limit violation observed by one of them in the other's area due to a 
contingency near their common boundary. The lack of such a procedure caused ineffective 
communications between PJM and MISO regarding PJM’s awareness of a possible overload on the 
Sammis-Star line as early as 15:48. An effective procedure would have enabled PJM to more clearly 
communicate the information it had regarding limit violations on the FE system, and would have 
enabled MISO to be aware of those conditions and initiate corrective actions with FE. 
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The deficiencies listed above were determined by investigators to be necessary and sufficient to cause the 
August 14 blackout — therefore they are labeled causes. Investigators identified many other deficiencies, 
which did not meet the “necessary and sufficient” test, and therefore were not labeled as causes of the 
blackout. In other words, a sufficient set of deficiencies already existed to cause the blackout without 
these other deficiencies. 

However, these other deficiencies represent significant conclusions of the investigation, as many of them 
aggravated the enabling conditions or the severity of the consequences of the blackout. An example is the 
ninth deficiency listed below, regarding poor communications within the FE control center. Poor 
communications within the control center did not cause the blackout and the absence of those poor 
communications within the FE control center would not have prevented the blackout. However, poor 
communications in the control center was a contributing factor, because it increased the state of confusion 
in the control center and exacerbated the FE operators’ lack of situational awareness. The investigators 
also discovered a few of these deficiencies to be unrelated to the blackout but still of significant concern 
to system reliability. An example is deficiency number eight: FE was operating close to a voltage 
collapse in the Cleveland-Akron area, although voltage collapse did not initiate the sequence of events 
that led to the blackout. 

1. Summary of Other Deficiencies Identified in the Blackout Investigation 

1. The NERC and ECAR compliance programs did not identify and resolve specific 
compliance violations before those violations led to a cascading blackout. Several entities in 
the ECAR region violated NERC operating policies and planning standards, and those violations 
contributed directly to the start of the cascading blackout. Had those violations not occurred, the 
blackout would not have occurred. The approach used for monitoring and assuring compliance 
with NERC and regional reliability standards prior to August 14 delegated much of the 
responsibility and accountability to the regional level. Due to confidentiality considerations, 
NERC did not receive detailed information about violations of specific parties prior to August 14. 
This approach meant that the NERC compliance program was only as effective as that of the 
weakest regional reliability council. 

2. There are no commonly accepted criteria that specifically address safe clearances of 
vegetation from energized conductors. The National Electrical Safety Code specifies in detail 
criteria for clearances from several classes of obstructions, including grounded objects. However, 
criteria for vegetation clearances vary by state and province, and by individual utility. 

3. Problems identified in studies of prior large-scale blackouts were repeated on August 14, 
including deficiencies in vegetation management, operator training, and tools to help 
operators better visualize system conditions. Although these issues had been previously 
reported, NERC and some regions did not have a systematic approach to tracking successful 
implementation of those prior recommendations. 

4. Reliability coordinators and control areas have adopted differing interpretations of the 
functions, responsibilities, authorities, and capabilities needed to operate a reliable power 
system. For example, MISO delegated substantial portions of its reliability oversight functions to 
its member control areas and did not provide a redundant set of eyes adequate for monitoring a 
wide-area view of reliability in its area of responsibility. Further, NERC operating policies do 
not specify what tools are specifically required of control areas and reliability coordinators, such 
as state estimation and network analysis tools, although the policies do specify the expected 
outcomes of analysis. 
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5. In ECAR, data used to model loads and generators were inaccurate due to a lack of 
verification through benchmarking with actual system data and field testing. Inaccuracies in 
load models and other system modeling data frustrated investigators trying to develop accurate 
simulations of the events on August 14. Inaccurate model data introduces potential errors in 
planning and operating models. Further, the lack of synchronized data recorders made the 
reconstruction of the sequence of events very difficult. 

6. In ECAR, planning studies, design assumptions, and facilities ratings were not consistently 
shared and were not subject to adequate peer review among operating entities and regions. 

As a result, systems were studied and analyzed in “silos” and study assumptions and results were 
not always understood by neighboring systems, although those assumptions affected those other 
systems. 

7. Available system protection technologies were not consistently applied to optimize the 
ability to slow or stop an uncontrolled cascading failure of the power system. The effects of 
zone 3 relays, the lack of under-voltage load shedding, and the coordination of underffequency 
load shedding and generator protection are all areas requiring further investigation to determine if 
opportunities exist to limit or slow the spread of a cascading failure of the system. 

8. FE was operating its system with voltages below critical voltages and with inadequate 
reactive reserve margins. FE did not retain and apply knowledge from earlier system studies 
concerning voltage collapse concerns in the Cleveland-Akron area. Conventional voltage studies 
done by FE to assess normal and abnormal voltage ranges and percent voltage decline did not 
accurately determine an adequate margin between post-contingency voltage and the voltage 
collapse threshold at various locations in their system. If FE had conducted voltage stability 
analyses using well-established P-V and Q-V techniques, FE would have detected insufficient 
dynamic reactive reserves at various locations in their system for the August 14 operating 
scenario that includes the Eastlake 5 outage. Additionally, FE’s stated acceptable ranges for 
voltage are not compatible with neighboring systems or interconnected systems in general. FE 
was operating in apparent violation of its own historical planning and operating criteria that were 
developed and used by Centerior Energy Coiporation (The Cleveland Electric Illuminating 
Company and the Toledo Edison Company) prior to 1998 to meet the relevant NERC and ECAR 
standards and criteria. In 1999, FE reduced its operating voltage lower limits in the Cleveland- 
Akron area compared to those criteria used in prior years. These reduced minimum operating 
voltage limits were disclosed in FE’s 1999-2003 Planning & Operating Criteria Form 715 
submittal to FERC, but were not challenged at the time. 

9. FE did not have an effective protocol for sharing operator information within the control 
room and with others outside the control room. FE did not have an effective plan for 
communications in the control center during a system emergency. Communications within the 
control center and with others outside the control center were confusing and hectic. The 
communications were not effective in helping the operators focus on the most urgent problem in 
front of them — the emerging system and computer failures. 

10. FE did not have an effective generation redispatch plan and did not have sufficient 
redispatch resources to relieve overloaded transmission lines supplying northeastern Ohio. 

Following the loss of the Chamberlin-Harding 345-kV line, FE had a contingency limit violation 
but did not have resources available for redispatch to effectively reduce the contingency overload 
within 30 minutes. 

11. FE did not have an effective load reduction plan and did not have an adequate load 
reduction capability, whether automatic or manual, to relieve overloaded transmission lines 
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supplying northeastern Ohio. A system operator is required to have adequate resources to 
restore the system to a secure condition within 30 minutes or less of a contingency. Analysis 
shows that shedding 2,000 MW of load in the Cleveland-Akron area after the loss of the Star- 
South Canton 345-kV line or shedding 2,500 after the West Akron Substation 138-kV bus failure 
could have halted the cascade in the northeastern Ohio area. 

12. FE did not adequately train its operators to recognize and respond to system emergencies, 
such as multiple contingencies. The FE operators did not recognize the information they were 
receiving as clear indications of an emerging system emergency. Even when the operators 
grasped the idea that their computer systems had failed and the system was in trouble, the 
operators did not formally declare a system emergency and inform MISO and neighboring 
systems. 

13. FE did not have the ability to transfer control of its power system to an alternate center or 
authority during system emergencies. FE had not arranged for a backup control center or 
backup system control and monitoring functions. A typical criterion would include the need to 
evacuate the control center due to fire or natural disaster. Although control center evacuation was 
not required on August 14, FE had an equivalent situation with the loss of its critical monitoring 
and control functionality in the control center. 

14. FE operational planning and system planning studies were not sufficiently comprehensive 
to ensure reliability because they did not include a full range of sensitivity studies based on 
the 2003 Summer Base Case. A comprehensive range of planning studies would have involved 
analyses of all operating scenarios likely to be encountered, including those for unusual operating 
conditions and potential disturbance scenarios. 

15. FE did not perform adequate hour-ahead operations planning studies after Eastlake 5 
tripped off-line at 13:31 to ensure that FE could maintain a 30-minute response capability 
for the next contingency. The FE system was not within single contingency limits from 15:06 to 
16:06. In addition to day-ahead planning, the system should have been restudied after the forced 
outage of Eastlake 5. 

16. FE did not perform adequate day-ahead operations planning studies to ensure that FE had 
adequate resources to return the system to within contingency limits following the possible 
loss of their largest unit, Perry 1. After Eastlake 4 was forced out on August 13, the operational 
plan was not modified for the possible loss of the largest generating unit, Perry 1. 

17. FE did not have or use specific criteria for declaring a system emergency. 

18. ECAR and MISO did not precisely define “critical facilities” such that the 345-kV lines in 
FE that caused a major cascading failure would have to be identified as critical facilities for 
MISO. MISO’s procedure in effect on August 14 was to request FE to identify critical facilities 
on its system to MISO. 

19. MISO did not have additional monitoring tools that provided high-level visualization of the 
system. A high-level monitoring tool, such as a dynamic map board, would have enabled MISO 
operators to view degrading conditions in the FE system. 

20. ECAR and its member companies did not adequately follow ECAR Document 1 to conduct 
regional and interregional system planning studies and assessments. This would have 
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enabled FE to further develop specific operating limits of their critical interfaces by assessing the 
effects of power imports and exports, and regional and interregional power transfers. 

21. ECAR did not have a coordinated procedure to develop and periodically review reactive 
power margins. This would have enabled all member companies to establish maximum power 
transfer levels and minimum operating voltages to respect these reactive margins. 

22. Operating entities and reliability coordinators demonstrated an over-reliance on the 
administrative levels of the TLR procedure to remove contingency and actual overloads, 
when emergency redispatch of other emergency actions were necessary. TLR is a market- 
based congestion relief procedure and is not intended for removing an actual violation in real¬ 
time. 

23. Numerous control areas in the Eastern Interconnection, including FE, were not correctly 
tagging dynamic schedules, resulting in large mismatches between actual, scheduled, and 
tagged interchange on August 14. This prevented reliability coordinators in the Eastern 
Interconnection from predicting and modeling the effects of these transactions on the grid. 

D. Blackout Recommendations 

1. NERC Recommendations Approved February 10, 2004 

On February 10, 2004, the NERC Board of Trustees approved 14 recommendations offered by the NERC 
Steering Group to address the causes of the August 14 blackout and other deficiencies. These 
recommendations remain valid and applicable to the conclusions of this final report. The 
recommendations fall into three categories: 

Actions to Remedy Specific Deficiencies: Specific actions directed to FE, MISO, and PJM to correct the 
deficiencies that led to the blackout. 

• Correct the direct causes of the August 14, 2003, blackout. 

Strategic Initiatives: Strategic initiatives by NERC and the regional reliability councils to strengthen 
compliance with existing standards and to formally track completion of recommended actions from 
August 14, and other significant power system events. 

• Strengthen the NERC Compliance Enforcement Program. 

• Initiate control area and reliability coordinator reliability readiness audits. 

• Evaluate vegetation management procedures and results. 

• Establish a program to track implementation of recommendations. 

Technical Initiatives: Technical initiatives to prevent or mitigate the impacts of future cascading 
blackouts. 


• Improve operator and reliability coordinator training. 

• Evaluate reactive power and voltage control practices. 

• Improve system protection to slow or limit the spread of future cascading outages. 


July 13, 2004 


101 




August 14, 2003, Blackout 
Final NERC Report 


Section V 

Conclusions and Recommendations 


• Clarify reliability coordinator and control area functions, responsibilities, capabilities, and 
authorities. 

• Establish guidelines for real-time operating tools. 

• Evaluate lessons learned during system restoration. 

• Install additional time-synchronized recording devices as needed. 

• Reevaluate system design, planning, and operating criteria. 

• Improve system modeling data and data exchange practices. 

2. U.S.-Canada Power System Outage Task Force Recommendations 

On April 5, 2004, the U.S.-Canada Power System Outage Task Force issued its final report of the August 
14 blackout containing its 46 recommendations. The recommendations were grouped into four areas: 

Group 1: Institutional Issues Related to Reliability (Recommendations 1-14) 

Group 2: Support and Strengthen NERC’s Actions of February 10, 2004 (Recommendations 15-31) 

Group 3: Physical and Cyber Security of North American Bulk Power Systems (Recommendations 
32-44) 

Group 4: Canadian Nuclear Power Sector (Recommendations 45-46) 

The investigation team is encouraged by the recommendations of the Task Force and believes these 
recommendations are consistent with the conclusions of the NERC investigation. Although the NERC 
investigation has focused on a technical analysis of the blackout, the policy recommendations in Group 1 
appear to support many of NERC’s findings regarding the need for legislation for enforcement of 
mandatory reliability standards. In other recommendations, the Task Force seeks to strengthen 
compliance enforcement and other NERC functions by advancing reliability policies at the federal, state, 
and provincial levels. 

The second group of Task Force recommendations builds upon the original fourteen NERC 
recommendations approved in February 2004. NERC has considered these expanded recommendations, 
is implementing these recommendations where appropriate, and will inform the Task Force if additional 
considerations make any recommendation inappropriate or impractical. 

The third group of Task Force recommendations addresses critical infrastructure protection issues. NERC 
agrees with the conclusions of the Task Force (Final Task Force report, page 132) that there is “no 
evidence that a malicious cyber attack was a direct or indirect cause of the August 14, 2003, power 
outage.” The recommendations of the Task Force report are forward-looking and address issues that 
should be considered, whether or not there had been a blackout on August 14. NERC has assigned its 
Critical Infrastructure Protection Committee to evaluate these recommendations and report what actions, 
if any, NERC should take to implement those recommendations. 

The fourth group of recommendations is specific to Canadian nuclear facilities and is outside the scope of 
NERC responsibilities. 

Additional NERC Recommendations 

While the ongoing NERC investigation has confirmed the validity of the original fourteen NERC 
recommendations and the NERC Steering Group concurs with the Task Force’s recommendations, four 
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additional NERC recommendations resulted from further investigation since February and in 
consideration of the Task Force final report. The additional NERC recommendations are as follows: 

Recommendation 4d — Develop a standard on vegetation clearances. 

The Planning Committee, working with the Standards Authorization Committee, shall develop a 
measurable standard that specifies the minimum clearances between energized high voltage lines and 
vegetation. Appropriate criteria from the National Electrical Safety Code, or other appropriate code, 
should be adapted and interpreted so as to be applicable to vegetation. 

Recommendation 15 — Develop a standing capability for NERC to investigate future blackouts and 
disturbances. 

NERC shall develop and be prepared to implement a NERC standing procedure for investigating future 
blackouts and system disturbances. Many of the methods, tools, and lessons from the investigation of the 
August 14 blackout are appropriate for adoption. 

Recommendation 16 — Accelerate the standards transition. 

NERC shall accelerate the transition from existing operating policies, planning standards, and compliance 
templates to a clear and measurable set of reliability standards. (This recommendation is consistent with 
the Task Force recommendation 25.) 

Recommendation 17 — Evaluate NERC actions in the areas of cyber and physical security 

The Critical Infrastructure Protection Committee shall evaluate the U.S.-Canada Power System Outage 
Task Force’s Group III recommendations to determine if any actions are needed by NERC and report a 
proposed action plan to the board. 

Action Plan 

NERC will develop a mechanism to track all of the NERC, Task Force, and other reliability 
recommendations resulting from subsequent investigations of system disturbances and compliance 
reviews. Details of that plan are outside the scope of this report. 

3. Complete Set of NERC Recommendations 

This section consolidates all NERC recommendations, including the initial 14 recommendations approved 
in February 2004 and the four additional recommendations described above, into a single place. 

Recommendation 1: Correct the Direct Causes of the August 14, 2003, Blackout. 

The principal causes of the blackout were that FE did not maintain situational awareness of conditions on 
its power system and did not adequately manage tree growth in its transmission rights-of-way. 
Contributing factors included ineffective diagnostic support provided by MISO as the reliability 
coordinator for FE and ineffective communications between MISO and PJM. 

NERC has taken immediate actions to ensure that the deficiencies that were directly causal to the August 
14 blackout are corrected. These steps are necessary to assure electricity customers, regulators, and 
others with an interest in the reliable delivery of electricity that the power system is being operated in a 
manner that is safe and reliable, and that the specific causes of the August 14 blackout have been 
identified and fixed. 


July 13, 2004 


103 



August 14, 2003, Blackout 
Final NERC Report 


Section V 

Conclusions and Recommendations 


Recommendation la: FE, MISO, and PJM shall each complete the remedial actions designated in 
Attachment A for their respective organizations and certify to the NERC board no later than June 
30, 2004, that these specified actions have been completed. Furthermore, each organization shall 
present its detailed plan for completing these actions to the NERC committees for technical review 
on March 23-24, 2004, and to the NERC board for approval no later than April 2, 2004. 

Recommendation lb: The NERC Technical Steering Committee shall immediately assign a team of 
experts to assist FE, MISO, and PJM in developing plans that adequately address the issues listed 
in Attachment A, and other remedial actions for which each entity may seek technical assistance. 

Recommendation 2: Strengthen the NERC Compliance Enforcement Program. 

NERC’s analysis of the actions and events leading to the August 14 blackout leads it to conclude that 
several violations of NERC operating policies contributed directly to an uncontrolled, cascading outage 
on the Eastern Interconnection. NERC continues to investigate additional violations of NERC and 
regional reliability standards and will issue a final report of those violations once the investigation is 
complete. 

In the absence of enabling legislation in the United States and complementary actions in Canada and 
Mexico to authorize the creation of an electric reliability organization, NERC lacks legally sanctioned 
authority to enforce compliance with its reliability rules. However, the August 14 blackout is a clear 
signal that voluntary compliance with reliability rules is no longer adequate. NERC and the regional 
reliability councils must assume firm authority to measure compliance, to more transparently report 
significant violations that could risk the integrity of the interconnected power system, and to take 
immediate and effective actions to ensure that such violations are corrected. Although all violations are 
important, a significant violation is one that could directly reduce the integrity of the interconnected 
power systems or otherwise cause unfavorable risk to the interconnected power systems. By contrast, a 
violation of a reporting or administrative requirement would not by itself generally be considered a 
significant violation. 

Recommendation 2a: Each regional reliability council shall report to the NERC Compliance 
Enforcement Program within one month of occurrence all significant violations of NERC operating 
policies and planning standards and regional standards, whether verified or still under 
investigation. Such reports shall confidentially note details regarding the nature and potential 
reliability impacts of the alleged violations and the identity of parties involved. Additionally, each 
regional reliability council shall report quarterly to NERC, in a format prescribed by NERC, all 
violations of NERC and regional reliability council standards. 

Recommendation 2b: When presented with the results of the investigation of any significant 
violation, and with due consideration of the surrounding facts and circumstances, the NERC board 
shall require an offending organization to correct the violation within a specified time. If the board 
determines that an offending organization is non-responsive and continues to cause a risk to the 
reliability of the interconnected power systems, the board will seek to remedy the violation by 
requesting assistance of the appropriate regulatory authorities in the United States, Canada, and 
Mexico. 

Recommendation 2c: The Planning and Operating Committees, working in conjunction with the 
Compliance Enforcement Program, shall review and update existing approved and draft 
compliance templates applicable to current NERC operating policies and planning standards; and 
submit any revisions or new templates to the board for approval no later than March 31, 2004. To 
expedite this task, the NERC President shall immediately form a Compliance Template Task Force 
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comprised of representatives of each committee. The Compliance Enforcement Program shall issue 
the board-approved compliance templates to the regional reliability councils for adoption into their 
compliance monitoring programs. 

This effort will make maximum use of existing approved and draft compliance templates in order to meet 
the aggressive schedule. The templates are intended to include all existing NERC operating policies and 
planning standards but can be adapted going forward to incoiporate new reliability standards as they are 
adopted by the NERC board for implementation in the future. 

Recommendation 2d: The NERC Compliance Enforcement Program and ECAR shall, within three 
months of the issuance of the final report from the Compliance and Standards investigation team, 
evaluate violations of NERC and regional standards, as compared to previous compliance reviews 
and audits for the applicable entities, and develop recommendations to improve the compliance 
process. 

Recommendation 3: Initiate Control Area and Reliability Coordinator Reliability Readiness 
Audits. 

In conducting its investigation, NERC found that deficiencies in control area and reliability coordinator 
capabilities to perform assigned reliability functions contributed to the August 14 blackout. In addition to 
specific violations of NERC and regional standards, some reliability coordinators and control areas were 
deficient in the performance of their reliability functions and did not achieve a level of performance that 
would be considered acceptable practice in areas such as operating tools, communications, and training. 

In a number of cases, there was a lack of clarity in the NERC policies with regard to what is expected of a 
reliability coordinator or control area. Although the deficiencies in the NERC policies must be addressed 
(see Recommendation 9), it is equally important to recognize that standards cannot prescribe all aspects of 
reliable operation and that minimum standards present a threshold, not a target for performance. 

Reliability coordinators and control areas must perform well, particularly under emergency conditions, 
and at all times strive for excellence in their assigned reliability functions and responsibilities. 

Recommendation 3a: The NERC Compliance Enforcement Program and the regional reliability 
councils shall jointly establish a program to audit the reliability readiness of all reliability 
coordinators and control areas, with immediate attention given to addressing the deficiencies 
identified in the August 14 blackout investigation. Audits of all control areas and reliability 
coordinators shall be completed within three years and continue in a three-year cycle. The 20 
highest priority audits, as determined by the Compliance Enforcement Program, will be completed 
by June 30, 2004. 

Recommendation 3b: NERC will establish a set of baseline audit criteria to which regional criteria 
may be added. The control area requirements will be based on the existing NERC Control Area 
Certification Procedure. Reliability coordinator audits will include evaluation of reliability plans, 
procedures, processes, tools, personnel qualifications, and training. In addition to reviewing 
written documents, the audits will carefully examine the actual practices and preparedness of 
control areas and reliability coordinators. 

Recommendation 3c: The reliability regions, with the oversight and direct participation of NERC, 
will audit each control area’s and reliability coordinator’s readiness to meet these audit criteria. 
FERC and other relevant regulatory agencies will be invited to participate in the audits, subject to 
the same confidentiality conditions as the other members of the audit teams. 

Recommendation 4: Evaluate Vegetation Management Procedures and Results. 
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Ineffective vegetation management was a major cause of the August 14 blackout and also contributed to 
other historical large-scale blackouts, like the one that occurred on July 2-3, 1996, in the West. 
Maintaining transmission line rights-of-way (ROW), including maintaining safe clearances of energized 
lines from vegetation, under-build, and other obstructions incurs a substantial ongoing cost in many areas 
of North America. However, it is an important investment for assuring a reliable electric system. 
Vegetation, such as the trees that caused the initial line trips in FE that led to the August 14, 2003, outage 
is not the only type of obstruction that can breach the safe clearance distances from energized lines. 

Other examples include under-build of telephone and cable TV lines, train crossings, and even nests of 
certain large bird species. 

NERC does not presently have standards for ROW maintenance. Standards on vegetation management 
are particularly challenging given the great diversity of vegetation and growth patterns across North 
America. However, NERC’s standards do require that line ratings are calculated so as to maintain safe 
clearances from all obstructions. Furthermore, in the United States, the National Electrical Safety Code 
(NESC) Rules 232, 233, and 234 detail the minimum vertical and horizontal safety clearances of 
overhead conductors from grounded objects and various types of obstructions. NESC Rule 218 addresses 
tree clearances by simply stating, “Trees that may interfere with ungrounded supply conductors should be 
trimmed or removed.” Several states have adopted their own electrical safety codes and similar codes 
apply in Canada. 

Recognizing that ROW maintenance requirements vary substantially depending on local conditions, 
NERC will focus attention on measuring performance as indicated by the number of high-voltage line 
trips caused by vegetation. This approach has worked well in the Western Electricity Coordinating 
Council (WECC) since being instituted after the 1996 outages. 

Recommendation 4a: NERC and the regional reliability councils shall jointly initiate a program to 
report all bulk electric system transmission line trips resulting from vegetation contact. The 
program will use the successful WECC vegetation monitoring program as a model. 

A line trip includes a momentary opening and reclosing of the line, a lock out, or a combination. For 
reporting purposes, all vegetation-related openings of a line occurring within one 24-hour period should 
be considered one event. Trips known to be caused by severe weather or other natural disaster such as 
earthquake are excluded. Contact with vegetation includes both physical contact and arcing due to 
insufficient clearance. 

All transmission lines operating at 230-kV and higher voltage, and any other lower voltage lines 
designated by the regional reliability council to be critical to the reliability of the bulk electric system, 
shall be included in the program. 

Recommendation 4b: Beginning with an effective date of January 1, 2004, each transmission 
operator will submit an annual report of all vegetation-related high-voltage line trips to its 
respective reliability region. Each region shall assemble a detailed annual report of vegetation- 
related line trips in the region to NERC no later than March 31 for the preceding year, with the 
first reporting to be completed by March 2005 for calendar year 2004. 

Vegetation management practices, including inspection and trimming requirements, can vary significantly 
with geography. Nonetheless, the events of August 14 and prior outages point to the need for 
independent verification that viable programs exist for ROW maintenance and that the programs are being 
followed. 
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Recommendation 4c: Each bulk electric transmission owner shall make its vegetation management 
procedure, and documentation of work completed, available for review and verification upon 
request by the applicable regional reliability council, NERC, or applicable federal, state, or 
provincial regulatory agency. 

(NEW) Recommendation 4d: The Planning Committee, working with the Standards Authorization 
Committee, shall develop a measurable standard that specifies the minimum clearances between 
energized high voltage lines and vegetation. Appropriate criteria from the National Electrical 
Safety Code, or other appropriate code, should be adapted and interpreted so as to be applicable to 
vegetation. 

Recommendation 5: Establish a Program to Track Implementation of Recommendations. 

The August 14 blackout shared a number of contributing factors with prior large-scale blackouts, 
including: 

• Conductors contacting trees 

• Ineffective visualization of power system conditions and lack of situational awareness 

• Ineffective communications 

• Lack of training in recognizing and responding to emergencies 

• Insufficient static and dynamic reactive power supply 

• Need to improve relay protection schemes and coordination 

It is important that recommendations resulting from system outages be adopted consistently by all regions 
and operating entities, not just those directly affected by a particular outage. Several lessons learned prior 
to August 14, if heeded, could have prevented the outage. WECC and NPCC, for example, have 
programs that could be used as models for tracking completion of recommendations. NERC and some 
regions have not adequately tracked completion of recommendations from prior events to ensure they 
were consistently implemented. 

Recommendation 5a: NERC and each regional reliability council shall establish a program for 
documenting completion of recommendations resulting from the August 14 blackout and other 
historical outages, as well as NERC and regional reports on violations of reliability standards, 
results of compliance audits, and lessons learned from system disturbances. 

Regions shall report quarterly to NERC on the status of follow-up actions to address recommendations, 
lessons learned, and areas noted for improvement. NERC staff shall report both NERC activities and a 
summary of regional activities to the board. 

Recommendation 5b: NERC shall by January 1, 2005, establish a reliability performance 
monitoring function to evaluate and report bulk electric system reliability performance. 

Assuring compliance with reliability standards, evaluating the reliability readiness of reliability 
coordinators and control areas, and assuring recommended actions are achieved will be effective steps in 
reducing the chances of future large-scale outages. However, it is important for NERC to also adopt a 
process for continuous learning and improvement by seeking continuous feedback on reliability 
performance trends, and not rely mainly on learning from and reacting to catastrophic failures. 
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Such a function would assess large-scale outages and near misses to determine root causes and lessons 
learned, similar to the August 14 blackout investigation. This function would incorporate the current 
Disturbance Analysis Working Group and expand that work to provide more proactive feedback to the 
NERC board regarding reliability performance. This program would also gather and analyze reliability 
performance statistics to inform the board of reliability trends. This function could develop procedures 
and capabilities to initiate investigations in the event of future large-scale outages or disturbances. Such 
procedures and capabilities would be shared between NERC and the regional reliability councils for use 
as needed, with NERC and regional investigation roles clearly defined in advance. 

Recommendation 6: Improve Operator and Reliability Coordinator Training. 

The investigation found that some reliability coordinators and control area operators had not received 
adequate training in recognizing and responding to system emergencies. Most notable was the lack of 
realistic simulations and drills for training and verifying the capabilities of operating personnel. This 
training deficiency contributed to the lack of situational awareness and failure to declare an emergency 
when operator intervention was still possible prior to the high-speed portion of the sequence of events. 

Recommendation 6: All reliability coordinators, control areas, and transmission operators shall 
provide at least five days per year of training and drills in system emergencies, using realistic 
simulations, for each staff person with responsibility for the real-time operation or reliability 
monitoring of the bulk electric system. This system emergency training is in addition to other 
training requirements. Five days of system emergency training and drills are to be completed prior 
to June 30, 2004, with credit given for documented training already completed since July 1, 2003. 
Training documents, including curriculum, training methods, and individual training records, are 
to be available for verification during reliability readiness audits. 

The term “realistic simulations” includes a variety of tools and methods that present operating personnel 
with situations to improve and test diagnostic and decision-making skills in an environment that 
resembles expected conditions during a particular type of system emergency. Although a full replica 
training simulator is one approach, lower cost alternatives such as PC-based simulators, tabletop drills, 
and simulated communications can be effective training aids if used properly. 

NERC has published Continuing Education Criteria specifying appropriate qualifications for continuing 
education providers and training activities. 

In the longer term, the NERC Personnel Certification Governance Committee (PCGC), which is 
independent of the NERC board, should explore expanding the certification requirements of system 
operating personnel to include additional measures of competency in recognizing and responding to 
system emergencies. The current NERC certification examination is a written test of the NERC 
Operating Manual and other references relating to operator job duties, and is not by itself intended to be a 
complete demonstration of competency to handle system emergencies. 

Recommendation 7: Evaluate Reactive Power and Voltage Control Practices. 

The blackout investigation identified inconsistent practices in northeastern Ohio with regard to the setting 
and coordination of voltage limits and insufficient reactive power supply. Although the deficiency of 
reactive power supply in northeastern Ohio did not directly cause the blackout, it was a contributing 
factor. 

Planning Standard II.B.S1 requires each regional reliability council to establish procedures for generating 
equipment data verification and testing, including reactive power capability. Planning Standard III.C.S1 
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requires that all synchronous generators connected to the interconnected transmission systems shall be 
operated with their excitation system in the automatic voltage control mode unless approved otherwise by 
the transmission system operator. S2 of this standard also requires that generators shall maintain a 
network voltage or reactive power output as required by the transmission system operator within the 
reactive capability of the units. 

On one hand, the unsafe conditions on August 14 with respect to voltage in northeastern Ohio can be said 
to have resulted from violations of NERC planning criteria for reactive power and voltage control, and 
those violations should have been identified through the NERC and ECAR compliance monitoring 
programs (addressed by Recommendation 2). On the other hand, investigators believe reactive power and 
voltage control deficiencies noted on August 14 are also symptomatic of a systematic breakdown of the 
reliability studies and practices in FE and the ECAR region. As a result, unsafe voltage criteria were set 
and used in study models and operations. There were also issues identified with reactive characteristics 
of loads, as addressed in Recommendation 14. 

Recommendation 7a: The Planning Committee shall reevaluate within one year the effectiveness of 
the existing reactive power and voltage control standards and how they are being implemented in 
practice in the ten NERC regions. Based on this evaluation, the Planning Committee shall 
recommend revisions to standards or process improvements to ensure voltage control and stability 
issues are adequately addressed. 

Recommendation 7b: ECAR shall, no later than June 30, 2004, review its reactive power and 
voltage criteria and procedures, verify that its criteria and procedures are being fully implemented 
in regional and member studies and operations, and report the results to the NERC board. 

Recommendation 8: Improve System Protection to Slow or Limit the Spread of Future Cascading 
Outages. 

The importance of automatic control and protection systems in preventing, slowing, or mitigating the 
impact of a large-scale outage cannot be stressed enough. To underscore this point, following the trip of 
the Sammis-Star line at 4:06, the cascading failure into parts of eight states and two provinces, including 
the trip of over 500 generating units and over 400 transmission lines, was completed in the next eight 
minutes. Most of the event sequence, in fact, occurred in the final 12 seconds of the cascade. Likewise, 
the July 2, 1996, failure took less than 30 seconds and the August 10, 1996, failure took only five 
minutes. It is not practical to expect operators will always be able to analyze a massive, complex system 
failure and to take the appropriate corrective actions in a matter of a few minutes. The NERC 
investigators believe that two measures would have been crucial in slowing or stopping the uncontrolled 
cascade on August 14: 

• Better application of zone 3 impedance relays on high-voltage transmission lines 

• Selective use of under-voltage load shedding. 

First, beginning with the Sammis-Star line trip, many of the remaining line trips during the cascade phase 
were the result of the operation of a zone 3 relay for a perceived overload ( a combination of high amperes 
and low voltage) on the protected line. If used, zone 3 relays typically act as an overreaching backup to 
the zone 1 and 2 relays, and are not intentionally set to operate on a line overload. However, under 
extreme conditions of low voltages and large power swings as seen on August 14, zone 3 relays can 
operate for overload conditions and propagate the outage to a wider area by essentially causing the system 
to “break up”. Many of the zone 3 relays that operated during the August 14 cascading outage were not 
set with adequate margins above their emergency thermal ratings. For the short times involved, thermal 
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heating is not a problem and the lines should not be tripped for overloads. Instead, power system 
protection devices should be set to address the specific condition of concern, such as a fault, out-of-step 
condition, etc., and should not compromise a power system’s inherent physical capability to slow down or 
stop a cascading event. 

Recommendation 8a: All transmission owners shall, no later than September 30, 2004, evaluate the 
zone 3 relay settings on all transmission lines operating at 230-kV and above for the purpose of 
verifying that each zone 3 relay is not set to trip on load under extreme emergency conditions. In 
each case that a zone 3 relay is set so as to trip on load under extreme conditions, the transmission 
operator shall reset, upgrade, replace, or otherwise mitigate the overreach of those relays as soon as 
possible and on a priority basis, but no later than December 31, 2005. Upon completing analysis of 
its application of zone 3 relays, each transmission owner may, no later than December 31, 2004, 
submit justification to NERC for applying zone 3 relays outside of these recommended parameters. 
The Planning Committee shall review such exceptions to ensure they do not increase the risk of 
widening a cascading failure of the power system. 

The investigation team recommends that the zone 3 relay, if used, should not operate at or below 150 
percent of the emergency ampere rating of a line, assuming a .85 per unit voltage and a line phase angle 
of 30 degrees. 

A second key conclusion with regard to system protection was that if an automatic under-voltage load 
shedding scheme had been in place in the Cleveland-Akron area on August 14, there is a high probability 
the outage could have been limited to that area. 

Recommendation 8b: Each regional reliability council shall complete an evaluation of the feasibility 
and benefits of installing under-voltage load shedding capability in load centers within the region 
that could become unstable as a result of being deficient in reactive power following credible 
multiple-contingency events. The regions are to complete the initial studies and report the results 
to NERC within one year. The regions are requested to promote the installation of under-voltage 
load shedding capabilities within critical areas, as determined by the studies to be effective in 
preventing an uncontrolled cascade of the power system. 

The NERC investigation of the August 14 blackout has identified additional transmission and generation 
control and protection issues requiring further analysis. One concern is that generating unit control and 
protection schemes need to consider the full range of possible extreme system conditions, such as the low 
voltages and low and high frequencies experienced on August 14. The team also noted that 
improvements may be needed in underfrequency load shedding and its coordination with generator under 
and over-frequency protection and controls. 

Recommendation 8c: The Planning Committee shall evaluate Planning Standard III — System 
Protection and Control and propose within one year specific revisions to the criteria to adequately 
address the issue of slowing or limiting the propagation of a cascading failure. The board directs 
the Planning Committee to evaluate the lessons from August 14 regarding relay protection design 
and application and offer additional recommendations for improvement. 

Recommendation 9: Clarify Reliability Coordinator and Control Area Functions, Responsibilities, 
Capabilities, and Authorities. 

Ambiguities in the NERC operating policies may have allowed entities involved in the August 14 
blackout to make different interpretations regarding the functions, responsibilities, capabilities, and 
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authorities of reliability coordinators and control areas. Characteristics and capabilities necessary to 
enable prompt recognition and effective response to system emergencies must be specified. 

The lack of timely and accurate outage information resulted in degraded performance of state estimator 
and reliability assessment functions on August 14. There is a need to review options for sharing of outage 
information in the operating time horizon (e.g., 15 minutes or less), so as to ensure the accurate and 
timely sharing of outage data necessary to support real-time operating tools such as state estimators, real¬ 
time contingency analysis, and other system monitoring tools. 

On August 14, reliability coordinator and control area communications regarding conditions in 
northeastern Ohio were ineffective, and in some cases confusing. Ineffective communications contributed 
to a lack of situational awareness and precluded effective actions to prevent the cascade. Consistent 
application of effective communications protocols, particularly during emergencies, is essential to 
reliability. Alternatives should be considered to one-on-one phone calls during an emergency to ensure 
all parties are getting timely and accurate information with a minimum number of calls. 

NERC operating policies do not adequately specify critical facilities, leaving ambiguity regarding which 
facilities must be monitored by reliability coordinators. Nor do the policies adequately define criteria for 
declaring transmission system emergencies. Operating policies should also clearly specify that curtailing 
interchange transactions through the NERC TLR procedure is not intended to be used as a method for 
restoring the system from an actual Operating Security Limit violation to a secure operating state. 

The Operating Committee shall complete the following by June 30, 2004: 

• Evaluate and revise the operating policies and procedures, or provide interpretations, to ensure 
reliability coordinator and control area functions, responsibilities, and authorities are completely 
and unambiguously defined. 

• Evaluate and improve the tools and procedures for operator and reliability coordinator 
communications during emergencies. 

• Evaluate and improve the tools and procedures for the timely exchange of outage information 
among control areas and reliability coordinators. 

Recommendation 10: Establish Guidelines for Real-Time Operating Tools. 

The August 14 blackout was caused by a lack of situational awareness that was in turn the result of 
inadequate reliability tools and backup capabilities. Additionally, the failure of the FE control computers 
and alarm system contributed directly to the lack of situational awareness. Likewise, MISO’s incomplete 
tool set and the failure of its state estimator to work effectively on August 14 contributed to the lack of 
situational awareness. 

Recommendation 10: The Operating Committee shall, within one year, evaluate the real-time 
operating tools necessary for reliable operation and reliability coordination, including backup 
capabilities. The Operating Committee is directed to report both minimum acceptable capabilities 
for critical reliability functions and a guide of best practices. 

This evaluation should include consideration of the following: 

• Modeling requirements, such as model size and fidelity, real and reactive load modeling, 
sensitivity analyses, accuracy analyses, validation, measurement, observability, update 
procedures, and procedures for the timely exchange of modeling data. 
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• State estimation requirements, such as periodicity of execution, monitoring external facilities, 
solution quality, topology error and measurement error detection, failure rates including times 
between failures, presentation of solution results including alarms, and troubleshooting 
procedures. 

• Real-time contingency analysis requirements, such as contingency definition, periodicity of 
failure execution, monitoring external facilities, solution quality, post-contingency automatic 
actions, rates including mean/maximum times between failures, reporting of results, presentation 
of solution results including alarms, and troubleshooting procedures including procedures for 
investigating non-converging contingency studies. 

Recommendation 11: Evaluate Lessons Learned During System Restoration. 

The efforts to restore the power system and customer service following the outage were effective, 
considering the massive amount of load lost and the large number of generators and transmission lines 
that tripped. Fortunately, the restoration was aided by the ability to energize transmission from 
neighboring systems, thereby speeding the recovery. Despite the apparent success of the restoration 
effort, it is important to evaluate the results in more detail to determine opportunities for improvement. 
Blackstart and restoration plans are often developed through study of simulated conditions. Robust testing 
of live systems is difficult because of the risk of disturbing the system or interrupting customers. The 
August 14 blackout provides a valuable opportunity to apply actual events and experiences to learn to 
better prepare for system blackstart and restoration in the future. That opportunity should not be lost, 
despite the relative success of the restoration phase of the outage. 

Recommendation 11a: The Planning Committee, working in conjunction with the Operating 
Committee, NPCC, ECAR, and PJM, shall evaluate the blackstart and system restoration 
performance following the outage of August 14, and within one year report to the NERC board the 
results of that evaluation with recommendations for improvement. 

Recommendation lib: All regional reliability councils shall, within six months of the Planning 
Committee report to the NERC board, reevaluate their procedures and plans to assure an effective 
blackstart and restoration capability within their region. 

Recommendation 12: Install Additional Time-Synchronized Recording Devices as Needed. 

A valuable lesson from the August 14 blackout is the importance of having time-synchronized system 
data recorders. NERC investigators labored over thousands of data items to synchronize the sequence of 
events, much like putting together small pieces of a very large puzzle. That process would have been 
significantly improved and sped up if there had been a sufficient number of synchronized data recording 
devices. 

NERC Planning Standard I.F — Disturbance Monitoring does require location of recording devices for 
disturbance analysis. Often time, recorders are available, but they are not synchronized to a time 
standard. All digital fault recorders, digital event recorders, and power system disturbance recorders 
should be time stamped at the point of observation with a precise Global Positioning Satellite (GPS) 
synchronizing signal. Recording and time-synchronization equipment should be monitored and calibrated 
to assure accuracy and reliability. 

Time-synchronized devices, such as phasor measurement units, can also be beneficial for monitoring a 
wide-area view of power system conditions in real-time, such as demonstrated in WECC with their Wide- 
Area Monitoring System (WAMS). 
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Recommendation 12a: The reliability regions, coordinated through the NERC Planning 
Committee, shall within one year define regional criteria for the application of synchronized 
recording devices in power plants and substations. Regions are requested to facilitate the 
installation of an appropriate number, type, and location of devices within the region as soon as 
practical to allow accurate recording of future system disturbances and to facilitate benchmarking 
of simulation studies by comparison to actual disturbances. 

Recommendation 12b: Facilities owners shall, in accordance with regional criteria, upgrade 
existing dynamic recorders to include GPS time synchronization and, as necessary, install 
additional dynamic recorders. 

Recommendation 13: Reevaluate System Design, Planning, and Operating Criteria. 

The investigation report noted that FE entered the day on August 14 with insufficient resources to stay 
within operating limits following a credible set of contingencies, such as the loss of the Eastlake 5 unit 
and the Chamberlin-Harding line. NERC will conduct an evaluation of operations planning practices and 
criteria to ensure expected practices are sufficient and well understood. The review will reexamine 
fundamental operating criteria, such as n-1 and the 30-minute limit in preparing the system for a next 
contingency, and Table I Category C.3 of the NERC planning standards. Operations planning and 
operating criteria will be identified that are sufficient to ensure the system is in a known and reliable 
condition at all times, and that positive controls, whether manual or automatic, are available and 
appropriately located at all times to return the Interconnection to a secure condition. Daily operations 
planning, and subsequent real-time operations planning will identify available system reserves to meet 
operating criteria. 

Recommendation 13a: The Operating Committee shall evaluate operations planning and operating 
criteria and recommend revisions in a report to the board within one year. 

Prior studies in the ECAR region did not adequately define the system conditions that were observed on 
August 14. Severe contingency criteria were not adequate to address the events of August 14 that led to 
the uncontrolled cascade. Also, northeastern Ohio was found to have insufficient reactive support to 
serve its loads and meet import criteria. Instances were also noted in the FE system and ECAR area of 
different ratings being used for the same facility by planners and operators and among entities, making 
the models used for system planning and operation suspect. NERC and the regional reliability councils 
must take steps to assure facility ratings are being determined using consistent criteria and being 
effectively shared and reviewed among entities and among planners and operators. 

Recommendation 13b: ECAR shall, no later than June 30, 2004, reevaluate its planning and study 
procedures and practices to ensure they are in compliance with NERC standards, ECAR Document 
No. 1, and other relevant criteria; and that ECAR and its members’ studies are being implemented 
as required. 

Recommendation 13c: The Planning Committee, working in conjunction with the regional 
reliability councils, shall within two years reevaluate the criteria, methods, and practices used for 
system design, planning, and analysis; and shall report the results and recommendations to the 
NERC board. This review shall include an evaluation of transmission facility ratings methods and 
practices, and the sharing of consistent ratings information. 

Regional reliability councils may consider assembling a regional database that includes the ratings of all 
bulk electric system (100-kV and higher voltage) transmission lines, transformers, phase angle regulators, 
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and phase shifters. This database should be shared with neighboring regions as needed for system 
planning and analysis. 

NERC and the regional reliability councils should review the scope, frequency, and coordination of 
interregional studies, to include the possible need for simultaneous transfer studies. Study criteria will be 
reviewed, particularly the maximum credible contingency criteria used for system analysis. Each control 
area will be required to identify, for both the planning and operating time horizons, the planned 
emergency import capabilities for each major load area. 

Recommendation 14: Improve System Modeling Data and Data Exchange Practices. 

The after-the-fact models developed to simulate August 14 conditions and events indicate that dynamic 
modeling assumptions, including generator and load power factors, used in planning and operating 
models were inaccurate. Of particular note, the assumptions of load power factor were overly optimistic 
(loads were absorbing much more reactive power than pre-August 14 models indicated). Another 
suspected problem is modeling of shunt capacitors under depressed voltage conditions. Regional 
reliability councils should establish regional power system models that enable the sharing of consistent, 
validated data among entities in the region. Power flow and transient stability simulations should be 
periodically compared (benchmarked) with actual system events to validate model data. Viable load 
(including load power factor) and generator testing programs are necessary to improve agreement 
between power flows and dynamic simulations and the actual system performance. 

Recommendation 14: The regional reliability councils shall, within one year, establish and begin 
implementing criteria and procedures for validating data used in power flow models and dynamic 
simulations by benchmarking model data with actual system performance. Validated modeling 
data shall be exchanged on an interregional basis as needed for reliable system planning and 
operation. 

(NEW) Recommendation 15: Develop a standing capability for NERC to investigate future 
blackouts and disturbances. 

NERC shall develop and be prepared to implement a NERC standing procedure for investigating future 
blackouts and system disturbances. Many of the methods, tools, and lessons from the investigation of the 
August 14 blackout are appropriate for adoption. 

(NEW) Recommendation 16: Accelerate the standards transition. 

NERC shall accelerate the transition from existing operating policies, planning standards, and compliance 
templates to a clear and measurable set of reliability standards. (This recommendation is consistent with 
the Task Force recommendation 25). 

(NEW) Recommendation 17 — Evaluate NERC actions in the areas of cyber and physical security. 

The Critical Infrastructure Protection Committee shall evaluate the U.S.-Canada Power System Outage 
Task Force’s Group III recommendations to determine if any actions are needed by NERC and report a 
proposed action plan to the board. 
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4. Specific Actions Directed to FE, MISO, and PJM 
Corrective Actions to be Completed by FirstEnergy 

FE shall complete the following corrective actions by June 30, 2004. Unless otherwise stated, the 
requirements apply to the FE northern Ohio system and connected generators. 

1. Voltage Criteria and Reactive Resources 

a. Interim Voltage Criteria. The investigation team found that FE was not operating on August 14 
within NERC planning and operating criteria with respect to its voltage profile and reactive 
power supply margin in the Cleveland-Akron area. FE was also operating outside its own 
historical planning and operating criteria that were developed and used by Centerior Energy 
Corporation (The Cleveland Electric Illuminating Company and the Toledo Edison Company) 
prior to 1998 to meet the relevant NERC and ECAR standards and criteria. FE stated acceptable 
ranges for voltage are not compatible with neighboring systems or interconnected systems in 
general. 

Until such time that the study of the northern Ohio system ordered by the Federal Energy 
Regulatory Commission (FERC) on December 23 is completed, and until FE is able to determine 
(in b. below) a current set of voltage and reactive requirements verified to be within NERC and 
ECAR criteria, FE shall immediately operate such that voltages at all 345-kV buses in the 
Cleveland-Akron area shall have a minimum voltage of .95 per unit following the simultaneous 
loss of the two largest generating units in that area. 

b. Calculation of Minimum Bus Voltages and Reactive Reserves. FE shall, consistent with or as 
part of the FERC-ordered study, determine the minimum location-specific voltages at all 345-kV 
and 138-kV buses and all generating stations within their control area (including merchant 
plants). FE shall determine the minimum dynamic reactive reserves that must be maintained in 
local areas to ensure that these minimum voltages are met following contingencies studied in 
accordance with ECAR Document 1. Criteria and minimum voltage requirements must comply 
with NERC planning criteria, including Table 1 A, Category C3, and Operating Policy 2. 

c. Voltage Procedures. FE shall determine voltage and reactive criteria and procedures to enable 
operators to understand and operate these criteria. 

d. Study Results. When the FERC-ordered study is completed, FE is to adopt the planning and 
operating criteria determined as a result of that study and update the operating criteria and 
procedures for its system operators. If the study indicates a need for system reinforcements, FE 
shall develop a plan for developing such reinforcements as soon as practical, and shall develop 
operational procedures or other mitigating programs to maintain safe operating conditions until 
such time that the necessary system reinforcements can be made. 

e. Reactive Resources. FE shall inspect all reactive resources, including generators, and assure that 
all are fully operational. FE shall verify that all installed capacitors have no blown fuses and that 
at least 98 percent of installed capacitors at 69-kV and higher are available and in service during 
the summer 2004. 

f. Communications. FE shall communicate its voltage criteria and procedures, as described in the 
items above to MISO and FE neighboring systems. 
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The FE 2003 Summer Assessment was not considered to be sufficiently comprehensive to cover a wide 
range of known and expected system conditions, nor effective for the August 14 conditions based on the 
following: 

• No voltage stability assessment was included to assess the Cleveland-Akron area which has a 
long-known history of potential voltage collapse, as indicated CEI studies prior to 1997, by non¬ 
convergence of power flow studies in the 1998 analysis, and advice from AEP of potential 
voltage collapse prior to the onset of the 2003 summer load period. 

• Only single contingencies were tested for basically one set of 2003 study conditions. This does 
not comply with the study requirements of ECAR Document 1. 

• Study conditions should have assumed a wider range of generation dispatch and import/export 
and interregional transfers. For example, imports from MECS (north-to-south transfers) are 
likely to be less stressful to the FE system than imports from AEP (south-to-north transfers). 
Sensitivity studies should have been conducted to assess the impact of each key parameter and 
derive the system operating limits accordingly based on the most limiting of transient stability, 
voltage stability, and thermal capability. 

• The 2003 study conditions are considered to be more onerous than those assumed in the 1998 
study, since the former has Davis Besse (830 MW) as a scheduled outage. However, the 2003 
study does not show any voltage instability problems as shown by the 1998 study. 

• The 2003 study conditions are far less onerous than the actual August 14 conditions from the 
generation and transmission availability viewpoint. This is another indication that n-1 
contingency assessment, based on one assumed system condition, is not sufficient to cover the 
variability of changing system conditions due to forced outages. 

FE shall prepare and submit to ECAR, with a copy to NERC, an Operational Preparedness and Action 
Plan to ensure system security and full compliance with NERC and [regional] planning and operating 
criteria, including ECAR Document 1. The action plan shall include, but not be limited to the following: 

a. 2004 Summer Studies. Complete a 2004 summer study to identify a comprehensive set of 
System Operating Limits (OSL) and Interconnection Reliability Limits (IRLs) based on the 
NERC Operating Limit Definition Task Force Report. Any inter-dependency between FE OSL 
and those of its neighboring entities, known and forecasted regional and interregional transfers, 
shall be included in the derivation of OSL and IRL. 

b. Extreme Contingencies. Identify high-risk contingencies that are beyond normal studied criteria 
and determine the performance of the system for these contingencies. Where these extreme 
contingencies result in cascading outages, determine means to reduce their probability of 
occurrence or impact. These contingencies and mitigation plans must be communicated to FE 
operators, ECAR, MISO, and neighboring systems. 

c. Maximum Import Capability. Determine the maximum import capability into the Cleveland- 
Akron area for the summer of 2004, consistent with the criteria stated in (1) above and all 
applicable NERC and ECAR criteria. The maximum import capability shall take into account 
historical and forecasted transactions and outage conditions expected with due regard to 
maintaining adequate operating and local dynamic reactive reserves. 
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d. Vegetation Management. FE was found to not be complying with its own procedures for rights- 
of-way maintenance and was not adequately resolving inspection and forced outage reports 
indicating persistent problems with vegetation contacts prior to August 14, 2003. FE shall 
complete rights-of-way trimming for all 345-kV and 138-kV transmission lines, so as to be in 
compliance with the National Electrical Safety Code criteria for safe clearances for overhead 
conductors, other applicable federal, state and local laws, and FE rights-of-way maintenance 
procedures. Priority should be placed on completing work for all 345-kV lines as soon as 
possible. FE will report monthly progress to NERC and ECAR. 

e. Line Ratings. FE shall reevaluate its criteria for calculating line ratings, survey the 345-kV, and 
138-kV rights-of-way by visual inspection to ensure line ratings are appropriate for the clearances 
observed, and calculate updated ratings for each line. FE shall ensure that system operators, 
MISO, ECAR, NERC (MMWG), and neighboring systems are informed of and able to use the 
updated line ratings. 

3. Emergency Response Capabilities and Preparedness 

a. Emergency Response Resources. FE shall develop a capability, no later than June 30, 2004, to 
reduce load in the Cleveland-Akron area by 1,500 MW within ten minutes of a directive to do so 
by MISO or the FE system operator. Such a capability may be provided by automatic or manual 
load shedding, voltage reduction, direct-controlled commercial or residential load management, 
or any other method or combination of methods capable of achieving the 1,500 MW of reduction 
in ten minutes without adversely affecting other interconnected systems. The amount of required 
load reduction capability may be reduced to an amount shown by the FERC-ordered study to be 
sufficient for response to severe contingencies and if approved by ECAR and NERC. 

b. Emergency Response Plan. FE shall develop emergency response plans, including plans to 
deploy the load reduction capabilities noted above. The plan shall include criteria for declaring 
an emergency and various states of emergency. The plan shall include detailed descriptions of 
authorities, operating procedures, and communication protocols with all the relevant entities 
including MISO, FE operators, and market participants within the FE area that have the ability to 
move generation or shed load upon orders from FE operators. The plan shall include procedures 
for load restoration after the declaration that the FE system is no longer in the emergency 
operating state. 

4. Operating Center and Training 

a. Operator Communications. FE shall develop communications procedures for FE operating 
personnel to use within FE, with MISO and neighboring systems, and others. The procedure and 
the operating environment within the FE system control center shall allow focus on reliable 
system operation and avoid distractions such as calls from customers and others who are not 
responsible for operation of a portion of the transmission system. 

b. Reliability Monitoring Tools. FE shall ensure its state estimation and real-time contingency 
analysis functions are being used to reliably execute full contingency analysis automatically every 
ten minutes, or on demand, and to alarm operators of potential first contingency violations. 

c. System Visualization Tools. FE shall provide its operating personnel with the capability to 
visualize the status of the power system from an overview perspective and to determine critical 
system failures or unsafe conditions quickly without multiple-step searches for failures. A 
dynamic map board or equivalent capability is encouraged. 
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d. Backup Functions and Center. FE shall develop and prepare to implement a plan for the loss of 
its system operating center or any portion of its critical operating functions. FE shall comport 
with the criteria of the NERC Reference Document — Back Up Control Centers, and ensure that 
FE is able to continue meeting all NERC and ECAR criteria in the event the operating center 
becomes unavailable. Consideration should be given to using capabilities at MISO or 
neighboring systems as a backup capability, at least for su mm er 2004, until alternative backup 
functionality can be provided. 

e. GE XA21 System Updates. Until the current energy management system is replaced, FE shall 
incorporate all fixes for the GE XA21 system known to be necessary to assure reliable and stable 
operation of critical reliability functions, and particularly to correct the alarm processor failure 
that occurred on August 14, 2003. 

f. Operator Training. Prior to June 30, 2004, FE shall meet the operator training requirements 
detailed in NERC Recommendation 6. 

g. Technical Support. FE shall develop and implement a written procedure describing the 
interactions between control center technical support personnel and system operators. The 
procedure shall address notification of loss of critical functionality and testing procedures. 

5. Corrective Actions to be Completed by MISO 

MISO shall complete the following corrective actions no later than June 30, 2004. 

1. Reliability Tools. MISO shall fully implement and test its topology processor to provide its 
operating personnel real-time view of the system status for all transmission lines operating and all 
generating units within its system, and all critical transmission lines and generating units in 
neighboring systems. Alarms should be provided for operators for all critical transmission line 
outages. MISO shall establish a means of exchanging outage information with its members and 
neighboring systems such that the MISO state estimation has accurate and timely information to 
perform as designed. MISO shall fully implement and test its state estimation and real-time 
contingency analysis tools to ensure they can operate reliably no less than every ten minutes. MISO 
shall provide backup capability for all functions critical to reliability. 

2. Visualization Tools. MISO shall provide its operating personnel with tools to quickly visualize 
system status and failures of key lines, generators, or equipment. The visualization shall include a 
high-level voltage profile of the systems at least within the MISO footprint. 

3. Training. Prior to June 30, 2004, MISO shall meet the operator training criteria stated in NERC 
Recommendation 6. 

4. Communications. MISO shall reevaluate and improve its communications protocols and procedures 
with operational support personnel within MISO, its operating members, and its neighboring control 
areas and reliability coordinators. 

5. Operating Agreements. MISO shall reevaluate its operating agreements with member entities to 
verify its authority to address operating issues, including voltage and reactive management, voltage 
scheduling, the deployment and redispatch of real and reactive reserves for emergency response, and 
the authority to direct actions during system emergencies, including shedding load. 
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6. Corrective Actions to be Completed by PJM 

PJM shall complete the following corrective actions no later than June 30, 2004. 

Communications. PJM shall reevaluate and improve its communications protocols and procedures 
between PJM and its neighboring reliability coordinators and control areas. 
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